Sebastian Jaenicke trnascan-se Improved detection of trna genes in genomic sequences trnascan-se Improved detection of trna genes in genomic sequences 1/15
Overview 1. trnas 2. Existing approaches 3. trnascan-se 4. Stage 1 5. Stage 2 6. Stage 3 7. Verification 8. Performance 9. Accuracy trnascan-se Improved detection of trna genes in genomic sequences 2/15
trnas length 75-95 nt acceptor stem at which a specific amino acid is attached anticodon reads mrna sequence by base pairing trna is folded with D and T-pseudo-U C loops in contact synthesized in two parts - body and acceptor stem trnascan-se Improved detection of trna genes in genomic sequences 3/15
trnas trnascan-se Improved detection of trna genes in genomic sequences 4/15
Existing approaches trnascan: Hierarchical, rule-based system; widely used, but error rate unsuitable for larger genomes (0.37 false positives per Mbp) Pavesi Algorithm: Searches for linear sequence signals; identifies trnas not detected by trnascan, combined sensitivity > 99%, but false positive rate 5 times higher than trnascan alone Covariance models: High sensitivity, high specificity, but also CPU intensive trnascan-se Improved detection of trna genes in genomic sequences 5/15
trnascan-se Authors: Todd Lowe, Sean Eddy (University of Washington), 1997 License: GNU General Public License Input: DNA or RNA sequences in FASTA format Output: tabular, ACeDB, or extended format including secondary structure information trnascan-se does no trna detection itself, but is a wrapper relying on third-party programs trnascan-se Improved detection of trna genes in genomic sequences 6/15
trnascan-se - Stage 1 run trnascan and Pavesi algorithm (EufindtRNA) on input sequence discard intron information from trnascan (unreliable) results merged into list of candidate trnas trnascan-se Improved detection of trna genes in genomic sequences 7/15
trnascan-se - Stage 2 extract candidate subsequences + 14 flanking nucleotides pass sequences to covels (covariance model search), threshold score 20 bits trnascan-se Improved detection of trna genes in genomic sequences 8/15
trnascan-se - Stage 3 use predicted trnas which have been confirmed with covels trim trna bounds as predicted by covels trnascan-se Improved detection of trna genes in genomic sequences 9/15
trnascan-se - Stage 3 use heuristics to distinguish pseudogenes from true trnas, i.e. if primary sequence score < 10 bits or secondary structure score < 5 bits run coves (covariance model global structure alignment) to predict secondary structure identify anticodons and introns (5+ consecutive non-consensus nucleotides within anticodon loop) trnascan-se Improved detection of trna genes in genomic sequences 10/15
Verification Annotated databases used for verification: bacterial, archaeal and eukaryotic DNA from Sprinzl trna database trna sequence subset of GenBank DNA from H. influenzae from TIGR 5th order Markov chain generated sequences based on C. elegans generated human sequence based on GC content trnascan-se Improved detection of trna genes in genomic sequences 11/15
Verification trna prediction with annotated database subsets Sequence source Literature trnascan EufindtRNA trna CM trnascan- SE Sprinzl DB (Archaea) 70 69 43 70 70 Sprinzl DB (Eubacteria) 240 226 205 239 237 Sprinzl DB (Eukarya) 279 265 275 279 279 GenBank trna 1462 1366 760 1456 1440 trnascan-se Improved detection of trna genes in genomic sequences 12/15
Performance Analysis time in CPU hours for various complete genomes Complete genome Size (Mbp) trnascan EufindtRNA trna CM trnascan- SE P. anserina 0.1 0.14 < 0.001 2.8 0.019 H. influenzae 1.8 2.54 < 0.001 51 0.069 C. elegans 100 139 0.15 2780 1.8 Human 3000 > 4170 7.1 83300 36.6 (SGI Indigo2 R4400 200 MHz) trnascan-se Improved detection of trna genes in genomic sequences 13/15
Accuracy trnascan-se detects 99-100% of true trnas less than 1 false positive per 15 billion nucleotides 1000-3000 times faster than covariance models true positives (%) false positives (per Mbp) search (bp/s) speed trnascan 1.3 95.1 0.37 400 EufindtRNA 88.8 0.23 373000 trna CM 99.8 < 0.002 20 trnascan-se 99.5 < 0.00007 30000 trnascan-se Improved detection of trna genes in genomic sequences 14/15
End Questions? trnascan-se Improved detection of trna genes in genomic sequences 15/15