1. Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) Facts from text: Automated gene annotation with ontologies and text-mining Conrad Plake Schroeder Group (Bioinformatics), Technische Unversität Dresden 26th November 2009
Facts are in databases Genes, proteins, diseases,... Annotations from controlled vocabularies and ontologies MeSH Protein-protein interactions 2
Facts are in databases Genes, proteins, diseases,... Annotations from controlled vocabularies and ontologies MeSH Protein-protein interactions Problem: Manual curation is not enough! 3
The Problem: Manual curation is not enough! Literature references for function annotation Baumgartner, W. A. et al. Bioinformatics 2007 23:i41-48i 4
Can text mining help to scale-up high-quality manual curation of gene products with ontologies? Winnenburg et al., Briefings in Bioinformatics, 2008 5
Can text mining help to scale-up high-quality manual curation of gene products with ontologies? Winnenburg et al., Briefings in Bioinformatics, 2008 Yes, it can! 6
Identification of genes... 7
Gene Ontology terms 8
and diseases 9
and diseases How to associate genes with GO terms and diseases? 10
Filter for significant co-occurrences Association score: Royer, Plake, Schroeder, GCB, 2009 11
www.gopubmed.org/gogene A search engine for genes Query for: - gene names, IDs (screen) - sequence (gene or protein) - keywords (PubMed) Sorts results by GO & MeSH Gene profiling Links to literature Supports curators Trend analysis for genes Plake et al. (2009) GoGene: gene annotation in the fast lane Nucleic Acids Research 12
Which rat genes have implications in osteoporosis and bone resorption? Rat Genome Database: 0 genes PubMed: 859 hits 13
Which rat genes have implications in osteoporosis and bone resorption? Rat Genome Database: 0 genes PubMed: 859 hits GoGene: 5 genes found 14
Linking back to the literature Csf1 and osteoporosis: Ctsk and osteoporosis: Tnfrsb1 and bone resorption 15
Prediction of drug-target interactions Only ~30% of drug-target pairs in DrugBank co-occur in PubMed 16
Prediction of drug-target interactions Only ~30% of drug-target pairs in DrugBank co-occur in PubMed Idea: Link drugs with targets via disease, process, function! Sparfloxacin (Zagam) DNA topoisomerase II neutrophil apoptosis apoptotic chromosome condensation endopeptidase activity aspartic endopeptidase activity dermatitis, phototoxic ataxia telangiectasia 17
Prediction of drug-target interactions Only ~30% of drug-target pairs in DrugBank co-occur in PubMed Idea: Link drugs with targets via disease, process, function! Sparfloxacin (Zagam) DNA topoisomerase II neutrophil apoptosis apoptotic chromosome condensation endopeptidase activity aspartic endopeptidase activity dermatitis, phototoxic ataxia telangiectasia Arbutamine (GenESA) beta-1-adrenergic receptor adrenoceptor activity beta-adrenergic receptor activity blood circulation regulation of heart contraction coronary diseases myocardial infarction 18
Similarity between Drugs and Proteins Similarity based on shortest path length between concepts in a concept graph Drug with concept Protein with concept B Plake et al., submitted 19
Recovering DrugBank by similarity ranking AUC for all drugs All targets: Human targets: 0.78 0.87 AUC for approved drugs All targets: Human targets: 0.95 0.93 AUC for experimental drugs All targets: Human targets: 0.64 0.70 Plake et al., submitted 20
Summary Employ ontologies to facilitate search and summarization of information Link genes and drugs to ontologies via literature Scale-up high quality annotation of gene products with text-mining Matching ontological profiles of proteins and drugs finds binding pairs GoGene: search genes at www.gopubmed.org/gogene 21
Kudos to: Michael Schroeder Rainer Winnenburg Loic Royer 22
The Problem: Manual curation is not enough! Progress of GO annotations in the protein database Swiss-Prot Baumgartner, W. A. et al. Bioinformatics 2007 23:i41-48i 23
Ontologies in the Life Sciences The Open Biomedical Ontologies (www.obofoundry.org) Smith et al. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology 25, 1251-1255.... 24
Ontologies in the Life Sciences Annotation of gene products Molecular Function Cellular Component Biological Process Annotation of literature MeSH Medical Subject Headings Disease Anatomy Chemicals Organisms... 25
Prediction of drug-target interactions Only ~30% of drug-target pairs from DrugBank co-occur in PubMed Co-occurrence of proteins and ontological concepts Co-occurrence of drugs and ontological concepts 26