Automated Annotation of Biomedical Text

Similar documents
Schema-Driven Relationship Extraction from Unstructured Text

Using Scripts to help in Biomedical Text Interpretation

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

Text mining for lung cancer cases over large patient admission data. David Martinez, Lawrence Cavedon, Zaf Alam, Christopher Bain, Karin Verspoor

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Clinical Genome Knowledge Base and Linked Data technologies. Aleksandar Milosavljevic

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

Chapter 12 Conclusions and Outlook

The Curing AI for Precision Medicine. Hoifung Poon

How preferred are preferred terms?

Facts from text: Automated gene annotation with ontologies and text-mining

Towards Querying Bioinformatic Linked Data in Natural Langua

Referring Expressions & Alternate Views of Summarization. Ling 573 Systems and Applications May 24, 2016

READ-BIOMED-SS: ADVERSE DRUG REACTION CLASSIFICATION OF MICROBLOGS USING EMOTIONAL AND CONCEPTUAL ENRICHMENT

How can Natural Language Processing help MedDRA coding? April Andrew Winter Ph.D., Senior Life Science Specialist, Linguamatics

The Proteasix Ontology

Semantic Infrastructure for Automated Lipid Classification

ORC: an Ontology Reasoning Component for Diabetes

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Standardize and Optimize. Trials and Drug Development

An Ontology-Based Methodology for the Migration of Biomedical Terminologies to Electronic Health Records

Defined versus Asserted Classes: Working with the OWL Ontologies. NIF Webinar February 9 th 2010

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Information Extraction

Biomedical resources for text mining

Medicaid Provider Manual. CPT Code Removal

UKParl: A Data Set for Topic Detection with Semantically Annotated Text

Figure S1. Analysis of endo-sirna targets in different microarray datasets. The

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Semantic Web & Semantic Web Services: Applications in Healthcare And Scientific Research

Considering a new paradigm for Alzheimer s disease research a response

Corpus Construction and Semantic Analysis of Indonesian Image Description

Virus Entry. Steps in virus entry. Penetration through cellular membranes. Intracellular transport John Wiley & Sons, Inc. All rights reserved.

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia

Learning the Fine-Grained Information Status of Discourse Entities

NATURAL LANGUAGE PROCESSING AS A METHOD FOR EVALUATION OF FACTORS INFLUENCING SMILE ATTRACTIVENESS

Connecting Distant Entities with Induction through Conditional Random Fields for Named Entity Recognition: Precursor-Induced CRF

FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy. Siddharth Patwardhan. August 2003

M.Sc. in Cognitive Systems. Model Curriculum

SUPPLEMENTAL TABLE AND FIGURES

These Terms Synonym Term Manual Used Relationship

Causal Knowledge Modeling for Traditional Chinese Medicine using OWL 2

Bjoern Peters La Jolla Institute for Allergy and Immunology Buenos Aires, Oct 31, 2012

Clinical terms and ICD 11

Extraction of Adverse Drug Effects from Clinical Records

Workshop/Hackathon for the Wordnet Bahasa

DISCOVERING IMPLICIT ASSOCIATIONS BETWEEN GENES AND HEREDITARY DISEASES

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes

Semantic Structure of the Indian Sign Language

Shades of Certainty Working with Swedish Medical Records and the Stockholm EPR Corpus

Chapter 2. Knowledge Representation: Reasoning, Issues, and Acquisition. Teaching Notes

Protein Trafficking in the Secretory and Endocytic Pathways

Animal Disease Event Recognition and Classification

Phenobridge WP 7. Crossing the species bridge between mouse and human. 17 February 2015, Helmholtz Zentrum München

Thursday, July 14, Monotonicity

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Data Structures vs. Study Results:

Survey of Knowledge Base Content

Intracellular Vesicular Traffic Chapter 13, Alberts et al.

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend

Not all NLP is Created Equal:

A Method for Analyzing Commonalities in Clinical Trial Target Populations

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

Introduction to Annotation for Gene Expression Analyses

Automated Image Biometrics Speeds Ultrasound Workflow

Boundary identification of events in clinical named entity recognition

Advancing methods to develop behaviour change interventions: A Scoping Review of relevant ontologies

Convolutional Neural Networks for Text Classification

Using an Integrated Ontology and Information Model for Querying and Reasoning about Phenotypes: The Case of Autism

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani

Ontologies for the Study of Neurological Disease

How to code rare diseases with international terminologies?

Exploiting deduction and abduction services for information retrieval. Ralf Moeller Hamburg University of Technology

Curriculum Vitae. Degree and date to be conferred: Masters in Computer Science, 2013.

OVERVIEW TUTORIAL BEHAVIORAL METHODS CLAIM: EMLAR VII EYE TRACKING: READING. Lecture (50 min) Short break (10 min) Computer Assignments (30 min)

Meta-Knowledge Annotation at the Event Level: Comparison between Abstracts and Full Papers

A Visual Representation of Part-Whole Relationships in BFO-Conformant Ontologies

Cognitive Psychology at The Ohio State University. Doctoral Program Brochure 2006 Edition

What Is A Knowledge Representation? Lecture 13

Semantic empowerment of Health Care and Life Science Applications

Hypertension encoded in GLIF

Using Natural Language Processing To Analyze Electronic Health Records. Philip Poon PhD Data Scientist

Nuno Barbosa-Morais. Senior Postdoctoral Fellow (Blencowe Lab) Recipient of a Marie Curie International Outgoing Fellowship

The Focused Exome service at Bristol Genetics Laboratory

ATLAS Automatic Translation Into Sign Languages

Relabeling Distantly Supervised Training Data for Temporal Knowledge Base Population

Unintended consequences of existential quantifications in biomedical ontologies

SNOMED CT and Orphanet working together

FDA Workshop NLP to Extract Information from Clinical Text

Symbolic rule-based classification of lung cancer stages from free-text pathology reports

Kaiser Permanente Convergent Medical Terminology (CMT)

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

17/01/2017. Protein trafficking between cell compartments. Lecture 3: The cytosol. The mitochondrion - the power plant of the cell

Writing World Literature in the Sign Languages of the World. The SignWriting Literature Project

Clinical Narratives Context Categorization: The Clinician Approach using RapidMiner

Asthma Surveillance Using Social Media Data

Transcription:

Automated Annotation of Biomedical Text Kevin Livingston, Ph.D. Postdoctoral Fellow Pharmacology Department, School of Medicine University of Colorado Anschutz Medical Campus Kevin.Livingston@ucdenver.edu http://compbio.ucdenver.edu/hunter_lab/livingston

Biomedical researchers are interested in understanding their data in the context of all known background knowledge: curated databases & literature. 2

3

Muscle Cell Development 4

Biomedical Data Sources Total Manual GO Annotations: 1,116,848 1,380 Database s in 2012 Total GO Annotations: 132,425,702 PubMed Articles Referenced: 94,518 5

1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 New Entries (thousands) Total Entries (millions) Pubmed Growth Rate 1100 1000 900 y = ~e 0.0405x R² = 0.99 25 20 800 700 600 500 400 300 y = ~e 0.0402x R² = 0.94 15 10 200 100 0 973,499 PubMed entries in 2011 (>2,600 per day) 5 0 2 journal articles per minute! 6

Vision DBs Ontologies Knowledge Base Intelligent Application s Texts Text Mining 7

Annotation for Computation Computer understandable Composable Provenance of compositions traceable 8

Compositional Annotation occurs_in & Knowledge vertebrate pigmentatio n denotes text annotation 3 subclassof TAXON:7742 Vertebrata basedon basedon GO:0043474 pigmentation CRAFT PMID:1473718 3 denotes text annotation 1 text annotation 2 denotes 9

Understanding language requires relating what has been said to existing knowledge structures. 10

Typical / Pipeline Model Corpus POS tagging Entity Recognition Word Sense Syntactic Parse Semantic Forms Discourse Model? Integration With Existing Knowledge KB 11

Direct Memory Access Parsing (DMAP) Corpus POS tagging Entity Recognition Word Sense Syntactic DMAP Parse Semantic Forms Discourse Model? Integration With Existing Knowledge KB 12

Direct Memory Access Parsing (DMAP) Identifies concepts in text using patterns composed of lexical and semantic concepts Incremental continuous recognition of concepts in text Parsing is fundamentally about recognition and integration with existing knowledge 13

Standardized Representations GOA UniProtKB O75151 GO:0032452 GOA record 147 denotes Text annotation 43 PMID:12345678 PHF2-mediated demethylation of histones denotes GO:0003824 catalytic activity subclassof GO:0032452 histone demethylase activity subclassof PHF2 histone has_agent demethylase activity Text annotation 71 Text annotation 89 subclassof UniProt:O7515 1 PHF2 14

OpenDMAP Patterns [plasma_membrane] = membrane of [cell] cell] [muscle cell membrane of a muscle 15

OpenDMAP Patterns [plasma_membrane] = membrane of [cell] [nuclear_membrane] = membrane of [nucleus] 16

OpenDMAP Patterns [plasma_membrane] = membrane of [cell] [nuclear_membrane] = membrane of [nucleus] [transmembrane_transport] = transportation through [membrane] 17

OpenDMAP Patterns [plasma_membrane] = membrane of [cell] membrane_of [cell] [nuclear_membrane] = membrane of membrane_of [nucleus] [nucleus] membrane_crossed [membrane] [transmembrane_transport] = transportation through [membrane] 18

Example Biomedical ga3 denotesgraph g3 Annotations regulates subclassof Negative Regulation N1 of Biological Process resultsinregulationby Interferon P1 subclassof Positive Regulation of Biological Process resultsinregulationof STAT6 basedon Interferon Positive Regulation of Biological Process ga2 denotesgraph basedon STAT6 g2 subclassof Positive Regulation P1 of Biological Process resultsinregulationof STAT6 denotes Resource denotes Resource denotes Resource ra5 ra6 ra7 Interferons inhibit activation of STAT6 19

How is Cav3 involved in muscle cell development? Text Book CL: muscle cell part_of Caveola of muscle cell is-a CC: Caveola contains CC: Caveola CC: Caveola Caveolin Caveolin has_go:cc_annotation Is-a [PRO] Caveolin3 Translatio n of Cav3 Caveolin3 CC: T-tubule CC: membrane raft CC: membrane fraction Caveolin3 Protein or CAV3 gene has_go:cc_annotation Cav3 gene Gene-Protein Protein Ontology Gene Ontology Annotation 20

part_of BP: glucose import positively_regulates is_a BP: positive regulation of glucose import CHEBI: glucose BP: glucose transport is_a glucose transmembrane transport results_in_transport_of is_a BP: transport BP: vesicle-mediated transport is_a is_a Attempted representation of CAV3 example using existing ontologies and relations. M. Bada & H. Tipney 04.22.09 BP: membrane budding CC: membrane-bounded vesicle part_of results_in_formation_of transcytotic caveolar budding in muscle cell results_in_formation_of BP: transcytosis CC: vesicle membrane membrane-bounded vesicle budded from caveola of muscle cell (human) insulin [INS, GeneID: 3630] is_a transcytosis of glucose transporter in muscle cell part_of has_part membrane of vesicle budded from caveola of muscle cell MF: glucose transmembrane transporter activity part_of CL: muscle cell Caveola of muscle cell Caveolin 3 [Cav3, GeneID: 859, 12391] CL: cell == CC: cell CC: caveola is_a has_function transcytotic plasma membrane to early endosome transport of glucose transporter in muscle cell results_in_transport_from plasma membrane of muscle cell CC: plasma membrane part_of CC: plasma membrane part precedes glucose transporter is_a BFO: continuant results_in_transportation_of New terms and relationships Hypotheses/guesses transcytotic early endosome to recycling endomsome transport of glucose transporter in muscle cell precedes transcytotic recycling endomsome to plasma membrane transport of glucose transporter in muscle cell occurs_in results_in_transport_from Semi-official relationships (i.e. provisional cross products) early endosome of muscle cell recycling endosome of muscle cell Protein [Gene Symbol, Entrez GeneID] and associated (useful) GO annotations is_a CC: early endosome CC: recycling endosome Existing terms and relationships CC: endosome (mouse) glucose transporter [Slc2a4, GeneID: 20528] 21

Annotation for Consumers? The linguistic community typically uses annotation as training data or for specific tasks An abundance of tools that can produce annotations in the specific format of those resources Tools for computational linguistics Biomedical annotation typically used for curating, indexing, or enrichment analysis But what about re-using annotations and tools in other contexts and for other purposes? 22

Summary Model that covers syntactic and semantic annotation Linguistic annotation Entity-based annotation Capture complex content that is not necessarily best represented via a single URI Created a GraphAnnotation that denotes a RDF named graph Add kiao:basedon to enable annotation compositions and provenance tracking Annotation-level Assertion-level 23

Acknowledgements University of Colorado: Hunter Lab Larry Hunter Mike Bada Bill Baumgartner Chris Roeder National ICT Australia Karin Verspoor Funding: NIH/NLM training grant Andrew W. Mellon Foundation 24

Automated Annotation of Biomedical Text Kevin Livingston, Ph.D. Postdoctoral Fellow Pharmacology Department, School of Medicine University of Colorado Anschutz Medical Campus Kevin.Livingston@ucdenver.edu http://compbio.ucdenver.edu/hunter_lab/livingston