Sebastian Jaenicke. trnascan-se. Improved detection of trna genes in genomic sequences

Similar documents
Explain that each trna molecule is recognised by a trna-activating enzyme that binds a specific amino acid to the trna, using ATP for energy

DNA codes for RNA, which guides protein synthesis.

SpliceDB: database of canonical and non-canonical mammalian splice sites

Studying Alternative Splicing

RNA (Ribonucleic acid)

Molecular Biology (BIOL 4320) Exam #2 April 22, 2002

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Gene finding. kuobin/

Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition

Bioinformatics Laboratory Exercise

RNA Processing in Eukaryotes *

Gene Finding in Eukaryotes

Molecular Biology (BIOL 4320) Exam #2 May 3, 2004

Protein Synthesis

Study the Evolution of the Avian Influenza Virus

1. Investigate the structure of the trna Synthase in complex with a trna molecule. (pdb ID 1ASY).

Sections 12.3, 13.1, 13.2

Pre-mRNA has introns The splicing complex recognizes semiconserved sequences

reads observed in trnas from the analysis of RNAs carrying a 5 -OH ends isolated from cells induced to express

High-throughput transcriptome sequencing

CHNOPS Simulating Protein Synthesis

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins.

Non-messenger RNAs. Karin Lagesen

PROTEIN SYNTHESIS. It is known today that GENES direct the production of the proteins that determine the phonotypical characteristics of organisms.

Gene Expression: Details (Eukaryotes) Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition

Complete Student Notes for BIOL2202

HBV. Next Generation Sequencing, data analysis and reporting. Presenter Leen-Jan van Doorn

Circular RNAs (circrnas) act a stable mirna sponges

Biochemistry 2000 Sample Question Transcription, Translation and Lipids. (1) Give brief definitions or unique descriptions of the following terms:

Bio 111 Study Guide Chapter 17 From Gene to Protein

The transfer RNA genes in Oryza sativa L. ssp. indica

Identification of mirnas in Eucalyptus globulus Plant by Computational Methods

Evidence of a Pathway of Reduction in Bacteria: Reduced Quantities of Restriction Sites Impact trna Activity in a Trial Set

Islamic University Faculty of Medicine

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

Eukaryotic small RNA Small RNAseq data analysis for mirna identification

he micrornas of Caenorhabditis elegans (Lim et al. Genes & Development 2003)

Finding subtle mutations with the Shannon human mrna splicing pipeline

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

L I F E S C I E N C E S

Mechanism of splicing

Objectives: Prof.Dr. H.D.El-Yassin

RNA and Protein Synthesis Guided Notes

Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

PROTOCOL FOR INFLUENZA A VIRUS GLOBAL SWINE H1 CLADE CLASSIFICATION

TRANSLATION: 3 Stages to translation, can you guess what they are?

Prediction of Alternative Splice Sites in Human Genes

Evaluating Classifiers for Disease Gene Discovery

ChIP-seq data analysis

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets

Processing of RNA II Biochemistry 302. February 13, 2006

Computational Biology I LSM5191

High AU content: a signature of upregulated mirna in cardiac diseases

Protein Synthesis and Mutation Review

Multiple sequence alignment

MCB 102 Third Exam Spring 2015

Hands-On Ten The BRCA1 Gene and Protein

Prediction of micrornas and their targets

GENOME-WIDE DETECTION OF ALTERNATIVE SPLICING IN EXPRESSED SEQUENCES USING PARTIAL ORDER MULTIPLE SEQUENCE ALIGNMENT GRAPHS

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Processing of RNA II Biochemistry 302. February 18, 2004 Bob Kelm

Translation Activity Guide

NIH Public Access Author Manuscript RNA Biol. Author manuscript; available in PMC 2009 November 25.

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

A Machine Learning Model for Discovery of Protein Isoforms as Biomarkers

MODULE 3: TRANSCRIPTION PART II

Processing of RNA II Biochemistry 302. February 14, 2005 Bob Kelm

a. From the grey navigation bar, mouse over Analyze & Visualize and click Annotate Nucleotide Sequences.

The Blueprint of Life: DNA to Protein. What is genetics? DNA Structure 4/27/2011. Chapter 7

The Blueprint of Life: DNA to Protein

Mature microrna identification via the use of a Naive Bayes classifier

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Chapter 32: Translation

Keywords Gene prediction, artificial neural network, donor splice site, acceptor splice site, Markov chain, fuzzy logic

1 By Drs. Ingrid Waldron and. Jennifer Doherty, Department of Biology, University of Pennsylvania, These Teacher

RNA Secondary Structures: A Case Study on Viruses Bioinformatics Senior Project John Acampado Under the guidance of Dr. Jason Wang

Reliable Prediction of Viral RNA Structures

PSSV User Manual (V1.0)

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Central Dogma. Central Dogma. Translation (mrna -> protein)

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Point total. Page # Exam Total (out of 90) The number next to each intermediate represents the total # of C-C and C-H bonds in that molecule.

TRANSLATION. Translation is a process where proteins are made by the ribosomes on the mrna strand.

Shape-based retrieval of CNV regions in read coverage data. Sangkyun Hong and Jeehee Yoon*

LESSON 4.4 WORKBOOK. How viruses make us sick: Viral Replication

Bioinformation Volume 5

Splice Site Prediction Using Artificial Neural Networks

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

BCH Graduate Survey of Biochemistry

Arabidopsis thaliana small RNA Sequencing. Report

Acceptor splice site prediction

Markov Blanket Methods for Classification: Applications in the Molecular Diagnosis of Lung Cancer and Thrombin Binding

BIO 5099: Molecular Biology for Computer Scientists (et al)

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

AutoOrthoGen. Multiple Genome Alignment and Comparison

1) DNA unzips - hydrogen bonds between base pairs are broken by special enzymes.

Influenza Virus HA Subtype Numbering Conversion Tool and the Identification of Candidate Cross-Reactive Immune Epitopes

Transcription:

Sebastian Jaenicke trnascan-se Improved detection of trna genes in genomic sequences trnascan-se Improved detection of trna genes in genomic sequences 1/15

Overview 1. trnas 2. Existing approaches 3. trnascan-se 4. Stage 1 5. Stage 2 6. Stage 3 7. Verification 8. Performance 9. Accuracy trnascan-se Improved detection of trna genes in genomic sequences 2/15

trnas length 75-95 nt acceptor stem at which a specific amino acid is attached anticodon reads mrna sequence by base pairing trna is folded with D and T-pseudo-U C loops in contact synthesized in two parts - body and acceptor stem trnascan-se Improved detection of trna genes in genomic sequences 3/15

trnas trnascan-se Improved detection of trna genes in genomic sequences 4/15

Existing approaches trnascan: Hierarchical, rule-based system; widely used, but error rate unsuitable for larger genomes (0.37 false positives per Mbp) Pavesi Algorithm: Searches for linear sequence signals; identifies trnas not detected by trnascan, combined sensitivity > 99%, but false positive rate 5 times higher than trnascan alone Covariance models: High sensitivity, high specificity, but also CPU intensive trnascan-se Improved detection of trna genes in genomic sequences 5/15

trnascan-se Authors: Todd Lowe, Sean Eddy (University of Washington), 1997 License: GNU General Public License Input: DNA or RNA sequences in FASTA format Output: tabular, ACeDB, or extended format including secondary structure information trnascan-se does no trna detection itself, but is a wrapper relying on third-party programs trnascan-se Improved detection of trna genes in genomic sequences 6/15

trnascan-se - Stage 1 run trnascan and Pavesi algorithm (EufindtRNA) on input sequence discard intron information from trnascan (unreliable) results merged into list of candidate trnas trnascan-se Improved detection of trna genes in genomic sequences 7/15

trnascan-se - Stage 2 extract candidate subsequences + 14 flanking nucleotides pass sequences to covels (covariance model search), threshold score 20 bits trnascan-se Improved detection of trna genes in genomic sequences 8/15

trnascan-se - Stage 3 use predicted trnas which have been confirmed with covels trim trna bounds as predicted by covels trnascan-se Improved detection of trna genes in genomic sequences 9/15

trnascan-se - Stage 3 use heuristics to distinguish pseudogenes from true trnas, i.e. if primary sequence score < 10 bits or secondary structure score < 5 bits run coves (covariance model global structure alignment) to predict secondary structure identify anticodons and introns (5+ consecutive non-consensus nucleotides within anticodon loop) trnascan-se Improved detection of trna genes in genomic sequences 10/15

Verification Annotated databases used for verification: bacterial, archaeal and eukaryotic DNA from Sprinzl trna database trna sequence subset of GenBank DNA from H. influenzae from TIGR 5th order Markov chain generated sequences based on C. elegans generated human sequence based on GC content trnascan-se Improved detection of trna genes in genomic sequences 11/15

Verification trna prediction with annotated database subsets Sequence source Literature trnascan EufindtRNA trna CM trnascan- SE Sprinzl DB (Archaea) 70 69 43 70 70 Sprinzl DB (Eubacteria) 240 226 205 239 237 Sprinzl DB (Eukarya) 279 265 275 279 279 GenBank trna 1462 1366 760 1456 1440 trnascan-se Improved detection of trna genes in genomic sequences 12/15

Performance Analysis time in CPU hours for various complete genomes Complete genome Size (Mbp) trnascan EufindtRNA trna CM trnascan- SE P. anserina 0.1 0.14 < 0.001 2.8 0.019 H. influenzae 1.8 2.54 < 0.001 51 0.069 C. elegans 100 139 0.15 2780 1.8 Human 3000 > 4170 7.1 83300 36.6 (SGI Indigo2 R4400 200 MHz) trnascan-se Improved detection of trna genes in genomic sequences 13/15

Accuracy trnascan-se detects 99-100% of true trnas less than 1 false positive per 15 billion nucleotides 1000-3000 times faster than covariance models true positives (%) false positives (per Mbp) search (bp/s) speed trnascan 1.3 95.1 0.37 400 EufindtRNA 88.8 0.23 373000 trna CM 99.8 < 0.002 20 trnascan-se 99.5 < 0.00007 30000 trnascan-se Improved detection of trna genes in genomic sequences 14/15

End Questions? trnascan-se Improved detection of trna genes in genomic sequences 15/15