Facts from text: Automated gene annotation with ontologies and text-mining

Similar documents
Introduction to Annotation for Gene Expression Analyses

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Biomedical resources for text mining

The PlantFAdb website and database are based on the superb SOFA database (sofa.mri.bund.de).

Building a Diseases Symptoms Ontology for Medical Diagnosis: An Integrative Approach

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

Phenotype analysis in humans using OMIM

Schema-Driven Relationship Extraction from Unstructured Text

PONTE: A Context-Aware Approach for Automated Clinical Trial Protocol Design

Advancing methods to develop behaviour change interventions: A Scoping Review of relevant ontologies

A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data

A 10-year summary of kinase small molecule research Text mining AACR abstracts (white paper)

Bioinformatics Laboratory Exercise

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

A framework for the study of diseases and adverse drug reactions

Phenobridge WP 7. Crossing the species bridge between mouse and human. 17 February 2015, Helmholtz Zentrum München

Towards an automated procedure for annotation of gene products through SAS Methods. Henrik Tveit Norwegian University of Science and Technology

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

How preferred are preferred terms?

Data Mining in Bioinformatics Day 4: Text Mining

Predicting disease associations via biological network analysis

DISCOVERING IMPLICIT ASSOCIATIONS BETWEEN GENES AND HEREDITARY DISEASES

Data driven Ontology Alignment. Nigam Shah

Automated Annotation of Biomedical Text

Following virus recombination and evolution

Supplementary Figure 1

I. Setup. - Note that: autohgpec_v1.0 can work on Windows, Ubuntu and Mac OS.

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

NIH Public Access Author Manuscript Stud Health Technol Inform. Author manuscript; available in PMC 2010 February 28.

Data mining with Ensembl Biomart. Stéphanie Le Gras

Representation of Part-Whole Relationships in SNOMED CT

DrugBank: a comprehensive resource for in silico drug discovery and exploration

Data and text mining applied to the computational study of protein interaction networks

EMBASE Find quick, relevant answers to your biomedical questions

Evidence-based Laboratory Medicine: Finding and Assessing the Evidence

Human health. Molecular mechanisms of biological systems. Teaching at. Research at. Brandeis University. Marine Biological Laboratory

Bjoern Peters La Jolla Institute for Allergy and Immunology Buenos Aires, Oct 31, 2012

Integration of Genomic, Proteomic and Biomedical Information on the Semantic Web

OMIM The Online Mendelian Inheritance in Man Knowledgebase: A Wardrobe Full of Genes. Ada Hamosh, MD, MPH

How to code rare diseases with international terminologies?

Cellular Reproduction

Clinical NLP, PubGene Clinical trials in Coremine Oncology Text processing and information extraction for surgery planning form

CURRICULUM VITA OF Xiaowen Chen

SFARI Gene 2.0 User Guide

Vocabulary: cell division, centriole, centromere, chromatid, chromatin, chromosome, cytokinesis, DNA, interphase, mitosis

USEFULNESS OF ONTOLOGIES FOR RARE DISEASES

Considering a new paradigm for Alzheimer s disease research a response

An Evolutionary Approach to the Representation of Adverse Events

Student Exploration: Cell Division

Abstract. Patricia G. Melloy*

Analyzing the Semantics of Patient Data to Rank Records of Literature Retrieval

UPF Bioinformatics course projects

Hands-On Ten The BRCA1 Gene and Protein

R2 Training Courses. Release The R2 support team

Introduction to the Partners Biobank Portal. December 2016

CELL CYCLE REGULATION AND CANCER. Cellular Reproduction II

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and

PubMed and Beyond: Clinical Resources from the National Library of Medicine

a. From the grey navigation bar, mouse over Analyze & Visualize and click Annotate Nucleotide Sequences.

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

Drug repurposing and therapeutic anti-mirna predictions in oxldl-induced the proliferation of vascular smooth muscle cell associated diseases

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

The Proteasix Ontology

Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance *

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend

Ontology-based interactive visualization of patient-generated research questions

Wikipedia-Based Automatic Diagnosis Prediction in Clinical Decision Support Systems

SpliceDB: database of canonical and non-canonical mammalian splice sites

The Cancer Genome Atlas Project Overview

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA LEO TUNKLE *

Package bionetdata. R topics documented: February 19, Type Package Title Biological and chemical data networks Version 1.0.

Supplementary Information

SUPPLEMENTAL DATA AGING, July 2014, Vol. 6 No. 7

Sub-Topic Classification of HIV related Opportunistic Infections. Miguel Anderson and Joseph Fonseca

Session 4 Rebecca Poulos

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

Ontologies for the Study of Neurological Disease

LUMC: LOPAC screen analysis

ABSTRACT. mircancer: a microrna-cancer Association Database and Toolkit Based on Text Mining. by Boya Xie. November, 2010

What can we contribute to cancer research and treatment from Computer Science or Mathematics? How do we adapt our expertise for them

Automatically extracting, ranking and visually summarizing the treatments for a disease

Blast Searcher Formative Evaluation. March 02, Adam Klinger and Josh Gutwill

Medical information: Where to find it, what to trust. Lewis H. Rowett Executive Editor Annals of Oncology

CLINICAL PROCESS IMPROVEMENT INITIATIVE (CPII) EFFICIENCY REPORT EXPLANATION January 4, 2016

Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise!

University of Pittsburgh Cancer Institute UPMC CancerCenter. Uma Chandran, MSIS, PhD /21/13

Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses. Objective

Development of Description Framework of Pharmacodynamics Ontology and its Application to Possible Drug-drug Interaction Reasoning

Translational Bioinformatics: Connecting Genes with Drugs

Title From User Usage to Subject Analysis A Case Study on the Oncogene

Automated Social Network Epidemic Data Collector

Cancer Informatics Lecture

Standardize and Optimize. Trials and Drug Development

Development of a NGS Cancer Research Database CancerBase

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

ITS accuracy at GenBank. Conrad Schoch Barbara Robbertse

Transcription:

1. Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) Facts from text: Automated gene annotation with ontologies and text-mining Conrad Plake Schroeder Group (Bioinformatics), Technische Unversität Dresden 26th November 2009

Facts are in databases Genes, proteins, diseases,... Annotations from controlled vocabularies and ontologies MeSH Protein-protein interactions 2

Facts are in databases Genes, proteins, diseases,... Annotations from controlled vocabularies and ontologies MeSH Protein-protein interactions Problem: Manual curation is not enough! 3

The Problem: Manual curation is not enough! Literature references for function annotation Baumgartner, W. A. et al. Bioinformatics 2007 23:i41-48i 4

Can text mining help to scale-up high-quality manual curation of gene products with ontologies? Winnenburg et al., Briefings in Bioinformatics, 2008 5

Can text mining help to scale-up high-quality manual curation of gene products with ontologies? Winnenburg et al., Briefings in Bioinformatics, 2008 Yes, it can! 6

Identification of genes... 7

Gene Ontology terms 8

and diseases 9

and diseases How to associate genes with GO terms and diseases? 10

Filter for significant co-occurrences Association score: Royer, Plake, Schroeder, GCB, 2009 11

www.gopubmed.org/gogene A search engine for genes Query for: - gene names, IDs (screen) - sequence (gene or protein) - keywords (PubMed) Sorts results by GO & MeSH Gene profiling Links to literature Supports curators Trend analysis for genes Plake et al. (2009) GoGene: gene annotation in the fast lane Nucleic Acids Research 12

Which rat genes have implications in osteoporosis and bone resorption? Rat Genome Database: 0 genes PubMed: 859 hits 13

Which rat genes have implications in osteoporosis and bone resorption? Rat Genome Database: 0 genes PubMed: 859 hits GoGene: 5 genes found 14

Linking back to the literature Csf1 and osteoporosis: Ctsk and osteoporosis: Tnfrsb1 and bone resorption 15

Prediction of drug-target interactions Only ~30% of drug-target pairs in DrugBank co-occur in PubMed 16

Prediction of drug-target interactions Only ~30% of drug-target pairs in DrugBank co-occur in PubMed Idea: Link drugs with targets via disease, process, function! Sparfloxacin (Zagam) DNA topoisomerase II neutrophil apoptosis apoptotic chromosome condensation endopeptidase activity aspartic endopeptidase activity dermatitis, phototoxic ataxia telangiectasia 17

Prediction of drug-target interactions Only ~30% of drug-target pairs in DrugBank co-occur in PubMed Idea: Link drugs with targets via disease, process, function! Sparfloxacin (Zagam) DNA topoisomerase II neutrophil apoptosis apoptotic chromosome condensation endopeptidase activity aspartic endopeptidase activity dermatitis, phototoxic ataxia telangiectasia Arbutamine (GenESA) beta-1-adrenergic receptor adrenoceptor activity beta-adrenergic receptor activity blood circulation regulation of heart contraction coronary diseases myocardial infarction 18

Similarity between Drugs and Proteins Similarity based on shortest path length between concepts in a concept graph Drug with concept Protein with concept B Plake et al., submitted 19

Recovering DrugBank by similarity ranking AUC for all drugs All targets: Human targets: 0.78 0.87 AUC for approved drugs All targets: Human targets: 0.95 0.93 AUC for experimental drugs All targets: Human targets: 0.64 0.70 Plake et al., submitted 20

Summary Employ ontologies to facilitate search and summarization of information Link genes and drugs to ontologies via literature Scale-up high quality annotation of gene products with text-mining Matching ontological profiles of proteins and drugs finds binding pairs GoGene: search genes at www.gopubmed.org/gogene 21

Kudos to: Michael Schroeder Rainer Winnenburg Loic Royer 22

The Problem: Manual curation is not enough! Progress of GO annotations in the protein database Swiss-Prot Baumgartner, W. A. et al. Bioinformatics 2007 23:i41-48i 23

Ontologies in the Life Sciences The Open Biomedical Ontologies (www.obofoundry.org) Smith et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology 25, 1251-1255.... 24

Ontologies in the Life Sciences Annotation of gene products Molecular Function Cellular Component Biological Process Annotation of literature MeSH Medical Subject Headings Disease Anatomy Chemicals Organisms... 25

Prediction of drug-target interactions Only ~30% of drug-target pairs from DrugBank co-occur in PubMed Co-occurrence of proteins and ontological concepts Co-occurrence of drugs and ontological concepts 26