Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Size: px
Start display at page:

Download "Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute"

Transcription

1 Bioinformatics Sequence Analysis: Part III. Pattern Searching and Gene Finding Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

2 Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence Analysis I. Pairwise alignments, database searching including BLAST (FL) [1, 2, 3] Sequence Analysis II. Database searching (continued), Pattern searching(fl)[7] No Class - Martin Luther King Holiday Sequence Analysis III. Hidden Markov models, gene finding algorithms (FL)[8] Computational Methods I. Genomic Resources and Unix (GB) Computational Methods II. Sequence analysis with Perl. (GB) No Class - President's Birthday Computational Methods III. Sequence analysis with Perl and BioPerl (GB) Mar 4 Proteins I. Multiple sequence alignments, phylogenetic trees (RL) [4, 6] Mar 11 Proteins II. Profile searches of databases, revealing protein motifs (RL) [9] Mar 18 Proteins III.Structural Genomics:structural comparisons and predictions (RL) Mar 25 Microarrays: designing chips, clustering methods (FL) WIBR Bioinformatics Course, Whitehead Institute,

3 Topics to Cover Pattern searching Gene Finding algorithms WIBR Bioinformatics Course, Whitehead Institute,

4 GGATTGTAAGAGTTACTGTTACATTTTCTGGCCTACTACCTTTAAAATTCCTGTTGCATTTCTTTGTATTTACAAGGAAAAGACTGAACTTTTTCTCATCAAAACTAG CTTTTTTCTCACAGGTTAAACTTGCACCAATGTCTGCTCTTTTTTTTTAATGTTTTTGGTACTCTGGGCAGACTTCAGTTTTTTAAAAAATAAAGATTCTAATGCAGC TATCTTGGCATTCCCTTTAAATACCTGTCTTAACCTCCTACTTTTATTTCCTACTCCTTTCCACACACATGCATACAATCCTTTACCTTTTAAAGAATCATTAAGACT GTCACACATTAGGAACTCTTTCGCTCACTCTTCTGTCATTTGCTGCAATATTGAAATTCTTATTTTGACCATCAATGCCTATTAATTCTTCTAATACATGAAGAAAAT GATTGAGTAGCAGCAGTACTATAGGTGGGAAATACAGTTTAACTGCTGAATTTTTATACCTCTCTGATTTATAGCTTGCTAATTAAATTGCTATTAATAGTTTGTTTG Status of Sequencing GCTTAATTAGACTTAAGAAAACAACAGGGCATGAGGAGAGAATTGTATGTAACCAGTGATATGATTATTCCTGAATGTACAGACAGAAGTAAGCCTGGACATTGTTTA (12/31/2001) ATTTAAAAACTTTAGTCCCTGCTTTAAGGGAATATGATAATGTATACTATGACAAATGTACTTTATTCTTCTAACACAGTAAGAATTACTTGGAACTTTTTCCTGAAA CTAAGTGCAGGAAAGCCCTGTGTGTCTTGGTTTAGTGATGGTTTCATTTCTAGCCATACAACTGATGGATTGTATACAATTTTTGTTAGTGCCAAAATAATCTGTTAT ATGAACAGACTTCTAAAATAATTTCTGTATATTATATATGTAAGAAGGCTTTTATTGAACAGCTTATTTTCCACTTGCAAGTTTATGGAAATATCAATATGTCAAAAT AAAAAGTGGGACAATTCTTTGCTGTTAGAAGAATGTGCTTATTATTTTGATTTCTTAAATGGTACATAATCAAAGTACTGCTGAACTATAGGTGCAGTATTCTACTAA ACATTTCGCTAGTAATACCACTGATTTAGAAACAAAACTGTTTATTTTTGCTTTCTGAATTTAGAATGCTGGGATTACCTGTTTAAATATGTTTTAGGGAATATAGAG ATTAAATCTGTACATACCTGTGCACATATATTCATGCACCCTCTGATTTTGGTTTTCTCGTTTTTGAGTTCTTAGAAAGTATCCACATACTCTTCTTTTAGTAGAAGT AGCTGTTTTAGAGAGAAGAAAAGGATGAGACTTTAAATAGTTGATTCTTTTTGTGTTTTCTACAAACTTTTTTGAATTTTAAATCACAAGCAAACTAATTTTCTGGTT TTTAGAAAGTAGATGATGATTTCAGAGGAGTAAGACATGCCAAACAGCGTGCTCGGTAGGATTTTAGGTAGTCAAATGCAGCTGAGAAAAAGTATTTTCAAGTCATAA GTTGCTAATTGATATGCTATGAACTAGTCAAAATAGGAACCATATGATTCATGTTAGATTTTCCTCTAGAGATGGATCTGAATGTTCAGTTCCAGCCAAGGTAGATTT 2% TACTTTCAACTTTTTAATCAATATCACTTTCTGTGCTTAATCTCTTTGGTGTTACCTTGTCCATTTTCATTTGTCTAAAATTCTGCAGGGATGACTACAATTTGGCAT AATGGTATAAATGAATTGTTAAGGGTAACTTTAAGTTGAAGATTAAAGCAAGATGCCATTTTCCCCATGTCTTTCATTTTGTTTACATTTTTTCCCTTTAAGTTAGTA TACACTACACATACTACAATAAAATATAATAATATGAAAAGGATTGTAAGAGTTACTGTTACATTTTCTGGCCTACTACCTTTAAAATTCCTGTTGCATTTCTTTGTA TTTACAAGGAAAAGACTGAACTTTTTCTCATCAAAACTAGCTTTTTTCTCACAGGTTAAACTTGCACCAATGTCTGCTCTTTTTTTTTAATGTTTTTGGTACTCTGGG CAGACTTCAGTTTTTTAAAAAATAAAGATTCTAATGCAGCTATCTTGGCATTCCCTTTAAATACCTGTCTTAACCTCCTACTTTTATTTCCTACTCCTTTCCACACAC Finished ATGCATACAATCCTTTACCTTTTAAAGAATCATTAAGACTGTCACACATTAGGAACTCTTTCGCTCACTCTTCTGTCATTTGCTGCAATATTGAAATTCTTATTTTGA CCATCAATGCCTATTAATTCTTCTAATACATGAAGAAAATGATTGAGTAGCAGCAGTACTATAGGTGGGAAATACAGTTTAACTGCTGAATTTTTATACCTCTCTGAT Unfinished TTATAGCTTGCTAATTAAATTGCTATTAATAGTTTGTTTGGCTTAATTAGACTTAAGAAAACAACAGGGCATGAGGAGAGAATTGTATGTAACCAGTGATATGATTAT 35% Not Sequenced TCCTGAATGTACAGACAGAAGTAAGCCTGGACATTGTTTAATTTAAAAACTTTAGTCCCTGCTTTAAGGGAATATGATAATGTATACTATGACAAATGTACTTTATTC TTCTAACACAGTAAGAATTACTTGGAACTTTTTCCTGAAACTAAGTGCAGGAAAGCCCTGTGTGTCTTGGTTTAGTGATGGTTTCATTTCTAGCCATACAACTGATGG ATTGTATACAATTTTTGTTAGTGCCAAAATAATCTGTTATATGAACAGACTTCTAAAATAATTTCTGTATATTATATATGTAAGAAGGCTTTTATTGAACAGCTTATT TTCCACTTGCAAGTTTATGGAAATATCAATATGTCAAAATAAAAAGTGGGACAATTCTTTGCTGTTAGAAGAATGTGCTTATTATTTTGATTTCTTAAATGGTACATA ATCAAAGTACTGCTGAACTATAGGTGCAGTATTCTACTAAACATTTCAGCTAGTAATACCACTGATTTAGAAACAAAACTGTTTATTTTTGCTTTCTGAATTTAGAAT GCTGGGATTACCTGTTTAAATATGTTTTAGGGAATATAGAGATTAAATCTGTACATACCTGTGCACATATATTCATGCACCCTCTGATTTTGGTTTTCTCGTTTTTGA GTTCTTAGAAAGTATCCACATACTCTTCTTTTAGTAGAAGTAGCTGTTTTAGAGAGAAGAAAAGGATGAGACTTTAAATAGTTGATTCTTTTTGTGTTTTCTACAAAC TTTTTTGAATTTTAAATCACAAGCAAACTAATTTTCTGGTTTTTAGAAAGTAGATGATGATTTCAGAGGAGTAAGACATGCCAAACAGCGTGCTCGGTAGGATTTTAG GTAGTCAAATGCAGCTGAGAAAAAGTATTTTCAAGTCATAAGTTGCTAATTGATATGCTATGAACTAGTCAAAATAGGAACCATATGATTCATGTTAGATTTTCCTCT 63% AGAGATGGATCTGAATGTTCAGTTCCAGCCAAGGTAGATTTTACTTTCAACTTTTTAATCAATATCACTTTCTGTGCTTAATCTCTTTGGTGTTACCTTGTCCATTTT CATTTGTCTAAAATTCTGCAGGGATGACTACAATTTGGCATAATGGTATAAATGAATTGTTAAGGGTAACTTTAAGTTGAAGATTAAAGCAAGATGCCATTTTCCCCA TGTCTTTCATTTTGTTTACATTTTTTCCCTTTAAGTTAGTATACACTACACATACTACAATAAAATATAATAATATGAAAAGGATTGATTCATGTTAGATTTTCCTCT AGAGATGGATCTGAATGTTCAGTTCCAGCCAAGGTAGATTTTACTTTCAACTTTTTAATCAATATCACTTTCTGTGCTTAATCTCTTTGGTGTTACCTTGTCCATTTT CATTTGTCTAAAATTCTGCAGGGATGACTACAATTTGGCATAATGGTATAAATGAATTGTTAAGGGTAACTTTAAGTTGAAGATTAAAGCAAGATGCCATTTTCCCC ATGTCTTTCATTTTGTTTACATTTTTTCCCTTTAAGTTAGTATACACTACACATACTACAATAAAATATAATAATATGAAAAGGATTGATTCATGTTAGATTTTCCTC TAGAGATGGATCTGAATGTTCAGTTCCAGCCAAGGTAGATTTTACTTTCAACTTTTTAATCAATATCACTTTCTGTGCTTAATCTCTTTGGTGTTACCTTGTCCATTT WIBR Bioinformatics Course, Whitehead Institute,

5 Topics to Cover Pattern searching Gene Finding algorithms WIBR Bioinformatics Course, Whitehead Institute,

6 Pattern Searching WIBR Bioinformatics Course, Whitehead Institute,

7 Pattern Searching RRRRYYYY WIBR Bioinformatics Course, Whitehead Institute,

8 Pattern Searching RRRRYYYY 4 purines followed by 4 pyrimidines WIBR Bioinformatics Course, Whitehead Institute,

9 Pattern Searching RRRRYYYY TATAA[1,0,0] 4 purines followed by 4 pyrimidines WIBR Bioinformatics Course, Whitehead Institute,

10 Pattern Searching RRRRYYYY TATAA[1,0,0] 4 purines followed by 4 pyrimidines TATAA, allowing 1 mismatch WIBR Bioinformatics Course, Whitehead Institute,

11 Pattern Searching RRRRYYYY TATAA[1,0,0] 4 purines followed by 4 pyrimidines TATAA, allowing 1 mismatch p1=6...8 GAGA ~p1 WIBR Bioinformatics Course, Whitehead Institute,

12 Pattern Searching RRRRYYYY TATAA[1,0,0] 4 purines followed by 4 pyrimidines TATAA, allowing 1 mismatch p1=6...8 GAGA ~p1 a hairpin with GAGA as the loop WIBR Bioinformatics Course, Whitehead Institute,

13 Pattern Searching RRRRYYYY TATAA[1,0,0] 4 purines followed by 4 pyrimidines TATAA, allowing 1 mismatch p1=6...8 GAGA ~p1 a hairpin with GAGA as the loop p1= p1 WIBR Bioinformatics Course, Whitehead Institute,

14 Pattern Searching RRRRYYYY TATAA[1,0,0] 4 purines followed by 4 pyrimidines TATAA, allowing 1 mismatch p1=6...8 GAGA ~p1 a hairpin with GAGA as the loop p1= p1 exact 6 character repeat separated by up to 8 WIBR Bioinformatics Course, Whitehead Institute,

15 Pattern Searching RRRRYYYY TATAA[1,0,0] 4 purines followed by 4 pyrimidines TATAA, allowing 1 mismatch p1=6...8 GAGA ~p1 a hairpin with GAGA as the loop p1= p1 exact 6 character repeat separated by up to 8 p1= p1[1,1,1] WIBR Bioinformatics Course, Whitehead Institute,

16 Pattern Searching RRRRYYYY TATAA[1,0,0] 4 purines followed by 4 pyrimidines TATAA, allowing 1 mismatch p1=6...8 GAGA ~p1 a hairpin with GAGA as the loop p1= p1 p1= p1[1,1,1] exact 6 character repeat separated by up to 8 allow one mismatch, deletion and insertion WIBR Bioinformatics Course, Whitehead Institute,

17 Pattern Searching Programs WIBR Bioinformatics Course, Whitehead Institute,

18 Pattern Searching Programs Patscan WIBR Bioinformatics Course, Whitehead Institute,

19 Pattern Searching Programs Patscan scan_for_matches patfile < inputfile WIBR Bioinformatics Course, Whitehead Institute,

20 Pattern Searching Programs Patscan scan_for_matches patfile < inputfile findpatterns WIBR Bioinformatics Course, Whitehead Institute,

21 Pattern Searching Programs Patscan findpatterns scan_for_matches patfile < inputfile gcg; findpatterns WIBR Bioinformatics Course, Whitehead Institute,

22 Pattern Searching Programs Patscan findpatterns scan_for_matches patfile < inputfile gcg; findpatterns fuzznuc, fuzzprot, fuzztrans, dreg WIBR Bioinformatics Course, Whitehead Institute,

23 Pattern Searching Programs Patscan findpatterns fuzznuc, fuzzprot, fuzztrans, dreg scan_for_matches patfile < inputfile gcg; findpatterns EMBOSS programs; web and Unix WIBR Bioinformatics Course, Whitehead Institute,

24 Topics to Cover Pattern searching Gene Finding algorithms WIBR Bioinformatics Course, Whitehead Institute,

25 Problem to Solve WIBR Bioinformatics Course, Whitehead Institute,

26 Types of Signals to Detect Transcriptional TSS TATA box PolyA Translational Kozak (CC A/G CCAUGG) Termination codon (UAA, UAG, UGA) Splicing Introns - GT AG WIBR Bioinformatics Course, Whitehead Institute,

27 Gene Finding Strategies WIBR Bioinformatics Course, Whitehead Institute,

28 Gene Finding Strategies Content-based methods codon usage, compositional complexity WIBR Bioinformatics Course, Whitehead Institute,

29 Gene Finding Strategies Content-based methods codon usage, compositional complexity Site-based methods presence or absence of specific pattern or sequence WIBR Bioinformatics Course, Whitehead Institute,

30 Gene Finding Strategies Content-based methods codon usage, compositional complexity Site-based methods presence or absence of specific pattern or sequence Comparative methods determination based on homology WIBR Bioinformatics Course, Whitehead Institute,

31 Evolution of Gene Finding Programs ??? WIBR Bioinformatics Course, Whitehead Institute,

32 Evolution of Gene Finding Programs 1980 ORF ??? WIBR Bioinformatics Course, Whitehead Institute,

33 Evolution of Gene Finding Programs 1980 ORF Codon Usage ??? WIBR Bioinformatics Course, Whitehead Institute,

34 Evolution of Gene Finding Programs 1980 ORF Codon Usage 1990 Splice Sites/Exons 2000??? WIBR Bioinformatics Course, Whitehead Institute,

35 Evolution of Gene Finding Programs 1980 ORF Codon Usage Splice Sites/Exons Whole genes Multiple genes??? WIBR Bioinformatics Course, Whitehead Institute,

36 Evolution of Gene Finding Programs 1980 ORF Codon Usage 1990 Splice Sites/Exons 2000??? Whole genes Multiple genes Alternative splicing Functional RNAs Nested genes WIBR Bioinformatics Course, Whitehead Institute,

37 RepeatMasker WIBR Bioinformatics Course, Whitehead Institute,

38 RepeatMasker WIBR Bioinformatics Course, Whitehead Institute,

39 RepeatMasker Summary: Total length: 8750 bp GC level: 35.61% Bases masked: 6803 bp (77.75%) WIBR Bioinformatics Course, Whitehead Institute,

40 Coding Measures Look at frequencies of codons (e.g. redundancy of genetic code; Leucine = UUA, UUG, CUU, CUC, CUA, CUG) 6-tuple or hexamer approach ACCTCG TACTCG GCCCTC Thr Ser Tyr Ser Ala Leu WIBR Bioinformatics Course, Whitehead Institute,

41 Fifth Order Markov Models Periodic fifth order Markov model. Circles represent consecutive DNA bases, numbers indicate codon position, and the arrows indicate that the next base is generated conditionally on the previous five and on the codon position. Burge and Karlin, Current Opinions in Structural Biology 1998, 8: WIBR Bioinformatics Course, Whitehead Institute,

42 Fifth Order Markov Models Periodic fifth order Markov model. Circles represent consecutive DNA bases, numbers indicate codon position, and the arrows indicate that the next base is generated conditionally on the previous five and on the codon position. Burge and Karlin, Current Opinions in Structural Biology 1998, 8: WIBR Bioinformatics Course, Whitehead Institute,

43 Fifth Order Markov Models Periodic fifth order Markov model. Circles represent consecutive DNA bases, numbers indicate codon position, and the arrows indicate that the next base is generated conditionally on the previous five and on the codon position. Burge and Karlin, Current Opinions in Structural Biology 1998, 8: WIBR Bioinformatics Course, Whitehead Institute,

44 Fifth Order Markov Models Periodic fifth order Markov model. Circles represent consecutive DNA bases, numbers indicate codon position, and the arrows indicate that the next base is generated conditionally on the previous five and on the codon position. Burge and Karlin, Current Opinions in Structural Biology 1998, 8: WIBR Bioinformatics Course, Whitehead Institute,

45 Hidden Markov Models Burge and Karlin, Current Opinions in Structural Biology 1998, 8: WIBR Bioinformatics Course, Whitehead Institute,

46 Hidden Markov Models unobserved or hidden states (e.g. coding or noncoding) Burge and Karlin, Current Opinions in Structural Biology 1998, 8: WIBR Bioinformatics Course, Whitehead Institute,

47 Hidden Markov Models unobserved or hidden states (e.g. coding or noncoding) states generated according to a first order Markov chain Burge and Karlin, Current Opinions in Structural Biology 1998, 8: WIBR Bioinformatics Course, Whitehead Institute,

48 Hidden Markov Models unobserved or hidden states (e.g. coding or noncoding) states generated according to a first order Markov chain observable DNA bases Burge and Karlin, Current Opinions in Structural Biology 1998, 8: WIBR Bioinformatics Course, Whitehead Institute,

49 Hidden Markov Models unobserved or hidden states (e.g. coding or noncoding) states generated according to a first order Markov chain observable DNA bases each base generated conditionally on identity of corresponding hidden state Burge and Karlin, Current Opinions in Structural Biology 1998, 8: WIBR Bioinformatics Course, Whitehead Institute,

50 GENSCAN Burge and Karlin, JMB:268:78-94, 1997 WIBR Bioinformatics Course, Whitehead Institute,

51 GENSCAN MM - prob for a given nuc to occur at position p depends on nuc occupying previous k postions Burge and Karlin, JMB:268:78-94, 1997 WIBR Bioinformatics Course, Whitehead Institute,

52 GENSCAN MM - prob for a given nuc to occur at position p depends on nuc occupying previous k postions Generalized Hidden Markov Model (GHMM) Burge and Karlin, JMB:268:78-94, 1997 WIBR Bioinformatics Course, Whitehead Institute,

53 GENSCAN MM - prob for a given nuc to occur at position p depends on nuc occupying previous k postions Generalized Hidden Markov Model (GHMM) Optimize module performing signal recognition Burge and Karlin, JMB:268:78-94, 1997 WIBR Bioinformatics Course, Whitehead Institute,

54 GENSCAN MM - prob for a given nuc to occur at position p depends on nuc occupying previous k postions Generalized Hidden Markov Model (GHMM) Optimize module performing signal recognition Incorporates influence of C+G content Burge and Karlin, JMB:268:78-94, 1997 WIBR Bioinformatics Course, Whitehead Institute,

55 GENSCAN MM - prob for a given nuc to occur at position p depends on nuc occupying previous k postions Generalized Hidden Markov Model (GHMM) Optimize module performing signal recognition Incorporates influence of C+G content Considers gene models on both strands Burge and Karlin, JMB:268:78-94, 1997 WIBR Bioinformatics Course, Whitehead Institute,

56 GENSCAN MM - prob for a given nuc to occur at position p depends on nuc occupying previous k postions Generalized Hidden Markov Model (GHMM) Optimize module performing signal recognition Incorporates influence of C+G content Considers gene models on both strands Can identify multiple genes Burge and Karlin, JMB:268:78-94, 1997 WIBR Bioinformatics Course, Whitehead Institute,

57 GENSCAN Burge and Karlin, JMB:268:78-94, 1997 WIBR Bioinformatics Course, Whitehead Institute,

58 Gene Finding Programs WIBR Bioinformatics Course, Whitehead Institute,

59 Gene Finding Programs FGENES - Sanger Center, UK WIBR Bioinformatics Course, Whitehead Institute,

60 Gene Finding Programs FGENES - Sanger Center, UK GeneMark HMM - Georgia Tech WIBR Bioinformatics Course, Whitehead Institute,

61 Gene Finding Programs FGENES - Sanger Center, UK GeneMark HMM - Georgia Tech Genie - UCSC WIBR Bioinformatics Course, Whitehead Institute,

62 Gene Finding Programs FGENES - Sanger Center, UK GeneMark HMM - Georgia Tech Genie - UCSC Genscan - Stanford and MIT WIBR Bioinformatics Course, Whitehead Institute,

63 Gene Finding Programs FGENES - Sanger Center, UK GeneMark HMM - Georgia Tech Genie - UCSC Genscan - Stanford and MIT HMMgene - CBS, Denmark WIBR Bioinformatics Course, Whitehead Institute,

64 Gene Finding Programs FGENES - Sanger Center, UK GeneMark HMM - Georgia Tech Genie - UCSC Genscan - Stanford and MIT HMMgene - CBS, Denmark Morgan - John Hopkins WIBR Bioinformatics Course, Whitehead Institute,

65 Gene Finding Programs FGENES - Sanger Center, UK GeneMark HMM - Georgia Tech Genie - UCSC Genscan - Stanford and MIT HMMgene - CBS, Denmark Morgan - John Hopkins MZEF - Cold Spring Harbor WIBR Bioinformatics Course, Whitehead Institute,

66 HMR195 Test Set Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

67 HMR195 Test Set 103 human, 82 mouse, 10 rat sequences Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

68 HMR195 Test Set 103 human, 82 mouse, 10 rat sequences Sequence new since August, 1997 Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

69 HMR195 Test Set 103 human, 82 mouse, 10 rat sequences Sequence new since August, 1997 Genomic sequences containing exactly one gene Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

70 HMR195 Test Set 103 human, 82 mouse, 10 rat sequences Sequence new since August, 1997 Genomic sequences containing exactly one gene No mrna sequences, pseudogenes or alternatively spliced genes Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

71 HMR195 Test Set 103 human, 82 mouse, 10 rat sequences Sequence new since August, 1997 Genomic sequences containing exactly one gene No mrna sequences, pseudogenes or alternatively spliced genes The mean length of sequences is 7,096 bp Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

72 HMR195 Test Set (con t) Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

73 HMR195 Test Set (con t) 43 single-exon genes; 152 multi-exon genes Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

74 HMR195 Test Set (con t) 43 single-exon genes; 152 multi-exon genes Average number of exons per gene is 4.86 Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

75 HMR195 Test Set (con t) 43 single-exon genes; 152 multi-exon genes Average number of exons per gene is 4.86 Mean exon length = 208 bp, mean intron length = 678 bp, mean coding length per gene = 1,015 bp (~330 aa) Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

76 HMR195 Test Set (con t) 43 single-exon genes; 152 multi-exon genes Average number of exons per gene is 4.86 Mean exon length = 208 bp, mean intron length = 678 bp, mean coding length per gene = 1,015 bp (~330 aa) Coding sequence 14%, intronic sequence 46% and intergenic DNA 40%. Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

77 Definitions Sensitivity: the proportion of true sites(e.g., exons or donor splice sites) which are correctly predicted = TP/(TP + FN) Specificity: the proportion of predicted sites which are correct = TP/(TP + FP) WIBR Bioinformatics Course, Whitehead Institute,

78 Results of Program Comparisons Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

79 Results of Program Comparisons Genscan and HMMgene had reliable scores for exons Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

80 Results of Program Comparisons Genscan and HMMgene had reliable scores for exons Nucleotide Sn =.95 for Genscan and.93 for HMMgene. Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

81 Results of Program Comparisons Genscan and HMMgene had reliable scores for exons Nucleotide Sn =.95 for Genscan and.93 for HMMgene. Sp =.90 and.93, respectively Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

82 Results of Program Comparisons Genscan and HMMgene had reliable scores for exons Nucleotide Sn =.95 for Genscan and.93 for HMMgene. Sp =.90 and.93, respectively Accuracy dependent on G+C content Rogic, Mackworth and Ouellette, Genome Research 11: , 2001 WIBR Bioinformatics Course, Whitehead Institute,

83 GenomeScan Yeh, Lim, and Burge, Genome Research 11: , WIBR Bioinformatics Course, Whitehead Institute,

84 GenomeScan Combines exon-intron and splice signal models with similarity to know proteins Yeh, Lim, and Burge, Genome Research 11: , WIBR Bioinformatics Course, Whitehead Institute,

85 GenomeScan Combines exon-intron and splice signal models with similarity to know proteins Used to identify genes in human draft sequence Yeh, Lim, and Burge, Genome Research 11: , WIBR Bioinformatics Course, Whitehead Institute,

86 GenomeScan Combines exon-intron and splice signal models with similarity to know proteins Used to identify genes in human draft sequence Uses GENSCAN and BLASTX Yeh, Lim, and Burge, Genome Research 11: , WIBR Bioinformatics Course, Whitehead Institute,

87 GenomeScan Combines exon-intron and splice signal models with similarity to know proteins Used to identify genes in human draft sequence Uses GENSCAN and BLASTX Procrustes and Genewise similar but can only predict one gene per genomic sequence Yeh, Lim, and Burge, Genome Research 11: , WIBR Bioinformatics Course, Whitehead Institute,

88 GenomeScan Yeh, Lim, and Burge, Genome Research 11: , WIBR Bioinformatics Course, Whitehead Institute,

89 GenomeScan Yeh, Lim, and Burge, Genome Research 11: , WIBR Bioinformatics Course, Whitehead Institute,

90 GenomeScan Yeh, Lim, and Burge, Genome Research 11: , WIBR Bioinformatics Course, Whitehead Institute,

91 Other Approaches WIBR Bioinformatics Course, Whitehead Institute,

92 Other Approaches Use microarrays to identify expressed genes based on the coexpression of sets of adjacent exons as predicted by GENSCAN (Shoemaker, et al, Nature 409: , 2001) WIBR Bioinformatics Course, Whitehead Institute,

93 Other Approaches Use microarrays to identify expressed genes based on the coexpression of sets of adjacent exons as predicted by GENSCAN (Shoemaker, et al, Nature 409: , 2001) RT-PCR with radio-labeled primers targeted to pairs of adjecent predicted exons, followed by sequencing of the amplified product (Burge et al, in preparation) WIBR Bioinformatics Course, Whitehead Institute,

94 Future Challenges Alternative Splicing Gene products functioning at RNA level Nested genes 5' end of genes Other unusual characteristics WIBR Bioinformatics Course, Whitehead Institute,

95 Coming attractions WIBR Bioinformatics Course, Whitehead Institute,

96 Coming attractions Next section on Computational Methods WIBR Bioinformatics Course, Whitehead Institute,

97 Coming attractions Next section on Computational Methods Unix accounts WIBR Bioinformatics Course, Whitehead Institute,

98 Coming attractions Next section on Computational Methods Unix accounts Course Projects WIBR Bioinformatics Course, Whitehead Institute,

Gene Finding in Eukaryotes

Gene Finding in Eukaryotes Gene Finding in Eukaryotes Jan-Jaap Wesselink jjwesselink@cnio.es Computational and Structural Biology Group, Centro Nacional de Investigaciones Oncológicas Madrid, April 2008 Jan-Jaap Wesselink jjwesselink@cnio.es

More information

Gene finding. kuobin/

Gene finding.  kuobin/ Gene finding KUO-BIN LI, PH.D. http://www.bii.a-star.edu.sg/ kuobin/ Bioinformatics Institute 30 Medical Drive, Level 1, IMCB Building Singapore 117609 Republic of Singapore Gene finding (LSM5191) p.1

More information

Bacterial Gene Finding CMSC 423

Bacterial Gene Finding CMSC 423 Bacterial Gene Finding CMSC 423 Finding Signals in DNA We just have a long string of A, C, G, Ts. How can we find the signals encoded in it? Suppose you encountered a language you didn t know. How would

More information

Studying Alternative Splicing

Studying Alternative Splicing Studying Alternative Splicing Meelis Kull PhD student in the University of Tartu supervisor: Jaak Vilo CS Theory Days Rõuge 27 Overview Alternative splicing Its biological function Studying splicing Technology

More information

Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W

Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W 5.1.2007 Overview High-quality finished sequence is much more useful for research once it is annotated. Annotation is a fundamental

More information

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 1 Abstract A stretch of chimpanzee DNA was annotated using tools including BLAST, BLAT, and Genscan. Analysis of Genscan predicted genes revealed

More information

Bio 111 Study Guide Chapter 17 From Gene to Protein

Bio 111 Study Guide Chapter 17 From Gene to Protein Bio 111 Study Guide Chapter 17 From Gene to Protein BEFORE CLASS: Reading: Read the introduction on p. 333, skip the beginning of Concept 17.1 from p. 334 to the bottom of the first column on p. 336, and

More information

TRANSLATION: 3 Stages to translation, can you guess what they are?

TRANSLATION: 3 Stages to translation, can you guess what they are? TRANSLATION: Translation: is the process by which a ribosome interprets a genetic message on mrna to place amino acids in a specific sequence in order to synthesize polypeptide. 3 Stages to translation,

More information

Protein Synthesis and Mutation Review

Protein Synthesis and Mutation Review Protein Synthesis and Mutation Review 1. Using the diagram of RNA below, identify at least three things different from a DNA molecule. Additionally, circle a nucleotide. 1) RNA is single stranded; DNA

More information

MODULE 3: TRANSCRIPTION PART II

MODULE 3: TRANSCRIPTION PART II MODULE 3: TRANSCRIPTION PART II Lesson Plan: Title S. CATHERINE SILVER KEY, CHIYEDZA SMALL Transcription Part II: What happens to the initial (premrna) transcript made by RNA pol II? Objectives Explain

More information

Figure 1: Final annotation map of Contig 9

Figure 1: Final annotation map of Contig 9 Introduction With rapid advances in sequencing technology, particularly with the development of second and third generation sequencing, genomes for organisms from all kingdoms and many phyla have been

More information

PROTEIN SYNTHESIS. It is known today that GENES direct the production of the proteins that determine the phonotypical characteristics of organisms.

PROTEIN SYNTHESIS. It is known today that GENES direct the production of the proteins that determine the phonotypical characteristics of organisms. PROTEIN SYNTHESIS It is known today that GENES direct the production of the proteins that determine the phonotypical characteristics of organisms.» GENES = a sequence of nucleotides in DNA that performs

More information

Bioinformatics Laboratory Exercise

Bioinformatics Laboratory Exercise Bioinformatics Laboratory Exercise Biology is in the midst of the genomics revolution, the application of robotic technology to generate huge amounts of molecular biology data. Genomics has led to an explosion

More information

Protein Synthesis

Protein Synthesis Protein Synthesis 10.6-10.16 Objectives - To explain the central dogma - To understand the steps of transcription and translation in order to explain how our genes create proteins necessary for survival.

More information

Protein sequence alignment using binary string

Protein sequence alignment using binary string Available online at www.scholarsresearchlibrary.com Scholars Research Library Der Pharmacia Lettre, 2015, 7 (5):220-225 (http://scholarsresearchlibrary.com/archive.html) ISSN 0975-5071 USA CODEN: DPLEB4

More information

HOST-PATHOGEN CO-EVOLUTION THROUGH HIV-1 WHOLE GENOME ANALYSIS

HOST-PATHOGEN CO-EVOLUTION THROUGH HIV-1 WHOLE GENOME ANALYSIS HOST-PATHOGEN CO-EVOLUTION THROUGH HIV-1 WHOLE GENOME ANALYSIS Somda&a Sinha Indian Institute of Science, Education & Research Mohali, INDIA International Visiting Research Fellow, Peter Wall Institute

More information

RNA and Protein Synthesis Guided Notes

RNA and Protein Synthesis Guided Notes RNA and Protein Synthesis Guided Notes is responsible for controlling the production of in the cell, which is essential to life! o DNARNAProteins contain several thousand, each with directions to make

More information

Chapter 12-4 DNA Mutations Notes

Chapter 12-4 DNA Mutations Notes Chapter 12-4 DNA Mutations Notes I. Mutations Introduction A. Definition: Changes in the DNA sequence that affect genetic information B. Mutagen= physical or chemical agent that interacts with DNA to cause

More information

Complete Student Notes for BIOL2202

Complete Student Notes for BIOL2202 Complete Student Notes for BIOL2202 Revisiting Translation & the Genetic Code Overview How trna molecules interpret a degenerate genetic code and select the correct amino acid trna structure: modified

More information

Prediction of micrornas and their targets

Prediction of micrornas and their targets Prediction of micrornas and their targets Introduction Brief history mirna Biogenesis Computational Methods Mature and precursor mirna prediction mirna target gene prediction Summary micrornas? RNA can

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

Sebastian Jaenicke. trnascan-se. Improved detection of trna genes in genomic sequences

Sebastian Jaenicke. trnascan-se. Improved detection of trna genes in genomic sequences Sebastian Jaenicke trnascan-se Improved detection of trna genes in genomic sequences trnascan-se Improved detection of trna genes in genomic sequences 1/15 Overview 1. trnas 2. Existing approaches 3. trnascan-se

More information

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets 02-716 Cross species analysis of genomics data Computational Prediction of mirnas and their targets Outline Introduction Brief history mirna Biogenesis Why Computational Methods? Computational Methods

More information

FINAL ANNOTATION REPORT: Drosophila virilis Fosmid 11 (48P14) Robert Carrasquillo Bio 4342

FINAL ANNOTATION REPORT: Drosophila virilis Fosmid 11 (48P14) Robert Carrasquillo Bio 4342 FINAL ANNOTATION REPORT: Drosophila virilis Fosmid 11 (48P14) Robert Carrasquillo Bio 4342 2006 TABLE OF CONTENTS I. Overview... 3 II. Genes... 4 III. Clustal Analysis... 15 IV. Repeat Analysis... 17 V.

More information

Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics

Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Karsten Borgwardt March 1 to March 12, 2010 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:

More information

Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition

Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition Donald J. Patterson, Ken Yasuhara, Walter L. Ruzzo January 3-7, 2002 Pacific Symposium on Biocomputing University of Washington Computational

More information

Genetics. Instructor: Dr. Jihad Abdallah Transcription of DNA

Genetics. Instructor: Dr. Jihad Abdallah Transcription of DNA Genetics Instructor: Dr. Jihad Abdallah Transcription of DNA 1 3.4 A 2 Expression of Genetic information DNA Double stranded In the nucleus Transcription mrna Single stranded Translation In the cytoplasm

More information

Central Dogma. Central Dogma. Translation (mrna -> protein)

Central Dogma. Central Dogma. Translation (mrna -> protein) Central Dogma Central Dogma Translation (mrna -> protein) mrna code for amino acids 1. Codons as Triplet code 2. Redundancy 3. Open reading frames 4. Start and stop codons 5. Mistakes in translation 6.

More information

The Cell T H E C E L L C Y C L E C A N C E R

The Cell T H E C E L L C Y C L E C A N C E R The Cell T H E C E L L C Y C L E C A N C E R Nuclear envelope Transcription DNA RNA Processing Pre-mRNA Translation mrna Nuclear pores Ribosome Polypeptide Transcription RNA is synthesized from DNA in

More information

Pre-mRNA has introns The splicing complex recognizes semiconserved sequences

Pre-mRNA has introns The splicing complex recognizes semiconserved sequences Adding a 5 cap Lecture 4 mrna splicing and protein synthesis Another day in the life of a gene. Pre-mRNA has introns The splicing complex recognizes semiconserved sequences Introns are removed by a process

More information

RNA Secondary Structures: A Case Study on Viruses Bioinformatics Senior Project John Acampado Under the guidance of Dr. Jason Wang

RNA Secondary Structures: A Case Study on Viruses Bioinformatics Senior Project John Acampado Under the guidance of Dr. Jason Wang RNA Secondary Structures: A Case Study on Viruses Bioinformatics Senior Project John Acampado Under the guidance of Dr. Jason Wang Table of Contents Overview RSpredict JAVA RSpredict WebServer RNAstructure

More information

Supplemental Information For: The genetics of splicing in neuroblastoma

Supplemental Information For: The genetics of splicing in neuroblastoma Supplemental Information For: The genetics of splicing in neuroblastoma Justin Chen, Christopher S. Hackett, Shile Zhang, Young K. Song, Robert J.A. Bell, Annette M. Molinaro, David A. Quigley, Allan Balmain,

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

Supplemental Materials and Methods Plasmids and viruses Quantitative Reverse Transcription PCR Generation of molecular standard for quantitative PCR

Supplemental Materials and Methods Plasmids and viruses Quantitative Reverse Transcription PCR Generation of molecular standard for quantitative PCR Supplemental Materials and Methods Plasmids and viruses To generate pseudotyped viruses, the previously described recombinant plasmids pnl4-3-δnef-gfp or pnl4-3-δ6-drgfp and a vector expressing HIV-1 X4

More information

Long non-coding RNAs

Long non-coding RNAs Long non-coding RNAs Dominic Rose Bioinformatics Group, University of Freiburg Bled, Feb. 2011 Outline De novo prediction of long non-coding RNAs (lncrnas) Genome-wide RNA gene-finding Intrinsic properties

More information

GENOME-WIDE COMPUTATIONAL ANALYSIS OF SMALL NUCLEAR RNA GENES OF ORYZA SATIVA (INDICA AND JAPONICA)

GENOME-WIDE COMPUTATIONAL ANALYSIS OF SMALL NUCLEAR RNA GENES OF ORYZA SATIVA (INDICA AND JAPONICA) GENOME-WIDE COMPUTATIONAL ANALYSIS OF SMALL NUCLEAR RNA GENES OF ORYZA SATIVA (INDICA AND JAPONICA) M.SHASHIKANTH, A.SNEHALATHARANI, SK. MUBARAK AND K.ULAGANATHAN Center for Plant Molecular Biology, Osmania

More information

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by

More information

CHNOPS Simulating Protein Synthesis

CHNOPS Simulating Protein Synthesis CHNOPS Simulating Protein Synthesis Protein Synthesis Protein Forming This page has all the information you need to complete the CHNOPS assignment. Base Pairing Rules for Transcription and Paring of Codon

More information

TITLE: The Role Of Alternative Splicing In Breast Cancer Progression

TITLE: The Role Of Alternative Splicing In Breast Cancer Progression AD Award Number: W81XWH-06-1-0598 TITLE: The Role Of Alternative Splicing In Breast Cancer Progression PRINCIPAL INVESTIGATOR: Klemens J. Hertel, Ph.D. CONTRACTING ORGANIZATION: University of California,

More information

SpliceDB: database of canonical and non-canonical mammalian splice sites

SpliceDB: database of canonical and non-canonical mammalian splice sites 2001 Oxford University Press Nucleic Acids Research, 2001, Vol. 29, No. 1 255 259 SpliceDB: database of canonical and non-canonical mammalian splice sites M.Burset,I.A.Seledtsov 1 and V. V. Solovyev* The

More information

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6 Alternative splicing Biosciences 741: Genomics Fall, 2013 Week 6 Function(s) of RNA splicing Splicing of introns must be completed before nuclear RNAs can be exported to the cytoplasm. This led to early

More information

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression Chromatin Array of nuc 1 Transcriptional control in Eukaryotes: Chromatin undergoes structural changes

More information

Supplementary Figure 1 Transcription assay of nine ABA-responsive PP2C. Transcription assay of nine ABA-responsive PP2C genes. Total RNA was isolated

Supplementary Figure 1 Transcription assay of nine ABA-responsive PP2C. Transcription assay of nine ABA-responsive PP2C genes. Total RNA was isolated Supplementary Figure 1 Transcription assay of nine ABA-responsive PP2C genes. Transcription assay of nine ABA-responsive PP2C genes. Total RNA was isolated from 7 day-old seedlings treated with or without

More information

This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is worth 2 points.

This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is worth 2 points. MBB 407/511 Molecular Biology and Biochemistry First Examination - October 1, 2002 Name Social Security Number This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

Molecular Biology (BIOL 4320) Exam #2 April 22, 2002

Molecular Biology (BIOL 4320) Exam #2 April 22, 2002 Molecular Biology (BIOL 4320) Exam #2 April 22, 2002 Name SS# This exam is worth a total of 100 points. The number of points each question is worth is shown in parentheses after the question number. Good

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Molecular Biology (BIOL 4320) Exam #2 May 3, 2004

Molecular Biology (BIOL 4320) Exam #2 May 3, 2004 Molecular Biology (BIOL 4320) Exam #2 May 3, 2004 Name SS# This exam is worth a total of 100 points. The number of points each question is worth is shown in parentheses after the question number. Good

More information

HEREDITY SAMPLE TOURNAMENT

HEREDITY SAMPLE TOURNAMENT HEREDITY SAMPLE TOURNAMENT PART 1 - BACKGROUND: 1. Heterozygous means. A. Information about heritable traits B. Unique/ different molecular forms of a gene that are possible at a given locus C. Having

More information

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq) RNA sequencing (RNA-seq) Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome

More information

Computational Biology I LSM5191

Computational Biology I LSM5191 Computational Biology I LSM5191 Aylwin Ng, D.Phil Lecture Notes: Transcriptome: Molecular Biology of Gene Expression II TRANSLATION RIBOSOMES: protein synthesizing machines Translation takes place on defined

More information

Sections 12.3, 13.1, 13.2

Sections 12.3, 13.1, 13.2 Sections 12.3, 13.1, 13.2 Now that the DNA has been copied, it needs to send its genetic message to the ribosomes so proteins can be made Transcription: synthesis (making of) an RNA molecule from a DNA

More information

Gene Expression: Details (Eukaryotes) Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition

Gene Expression: Details (Eukaryotes) Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition re-mrna rediction Aids plice ite Recognition Gene Expression: Details (Eukaryotes) DNA pre-mrna mrna rotein nucleus gene rotein Donald J. atterson, Ken Yasuhara, Walter L. Ruzzo January 3-7, 2002 DNA (chromosome)

More information

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING John Archer Faculty of Life Sciences University of Manchester HIV Dynamics and Evolution, 2008, Santa Fe, New Mexico. Overview

More information

MUTATIONS, MUTAGENESIS, AND CARCINOGENESIS

MUTATIONS, MUTAGENESIS, AND CARCINOGENESIS MUTATIONS, MUTAGENESIS, AND CARCINOGENESIS How do different alleles arise? ( allele : form of a gene; specific base sequence at a site on DNA) Mutations: heritable changes in genes Mutations occur in DNA

More information

Ambient temperature regulated flowering time

Ambient temperature regulated flowering time Ambient temperature regulated flowering time Applications of RNAseq RNA- seq course: The power of RNA-seq June 7 th, 2013; Richard Immink Overview Introduction: Biological research question/hypothesis

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

Introduction retroposon

Introduction retroposon 17.1 - Introduction A retrovirus is an RNA virus able to convert its sequence into DNA by reverse transcription A retroposon (retrotransposon) is a transposon that mobilizes via an RNA form; the DNA element

More information

CASE TEACHING NOTES. Decoding the Flu INTRODUCTION / BACKGROUND CLASSROOM MANAGEMENT

CASE TEACHING NOTES. Decoding the Flu INTRODUCTION / BACKGROUND CLASSROOM MANAGEMENT CASE TEACHING NOTES for Decoding the Flu by Norris Armstrong Department of Genetics, University of Georgia, Athens, GA INTRODUCTION / BACKGROUND This case is a clicker case. It is designed to be presented

More information

GENOME-WIDE DETECTION OF ALTERNATIVE SPLICING IN EXPRESSED SEQUENCES USING PARTIAL ORDER MULTIPLE SEQUENCE ALIGNMENT GRAPHS

GENOME-WIDE DETECTION OF ALTERNATIVE SPLICING IN EXPRESSED SEQUENCES USING PARTIAL ORDER MULTIPLE SEQUENCE ALIGNMENT GRAPHS GENOME-WIDE DETECTION OF ALTERNATIVE SPLICING IN EXPRESSED SEQUENCES USING PARTIAL ORDER MULTIPLE SEQUENCE ALIGNMENT GRAPHS C. GRASSO, B. MODREK, Y. XING, C. LEE Department of Chemistry and Biochemistry,

More information

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space Whole genome sequencing Whole exome sequencing BWA alignment to reference transcriptome and genome Convert transcriptome mappings back to genome space genomes Filter on MQ, distance, Cigar string Annotate

More information

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins.

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins. Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins. The RNA transcribed from a complex transcription unit

More information

Section Chapter 14. Go to Section:

Section Chapter 14. Go to Section: Section 12-3 Chapter 14 Go to Section: Content Objectives Write these Down! I will be able to identify: The origin of genetic differences among organisms. The possible kinds of different mutations. The

More information

www.lessonplansinc.com Topic: Protein Synthesis - Sentence Activity Summary: Students will simulate transcription and translation by building a sentence/polypeptide from words/amino acids. Goals & Objectives:

More information

The transfer RNA genes in Oryza sativa L. ssp. indica

The transfer RNA genes in Oryza sativa L. ssp. indica Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 The transfer RNA genes in Oryza sativa L. ssp. indica WANG Xiyin ( ) 1,2,3*, SHI Xiaoli ( ) 1,2* & HAO Bailin ( ) 2,4 1. College of Life Science,

More information

Searching for unusual clusters of palindromes and close inversions in the SARS virus genome. June 4, 2003

Searching for unusual clusters of palindromes and close inversions in the SARS virus genome. June 4, 2003 1/33 Searching for unusual clusters of palindromes and close inversions in the SARS virus genome June 4, 2003 Kwok-Pui Choi Department of Mathematics, NUS Department of Statistics and Applied Probability

More information

TRANSCRIPTION. DNA à mrna

TRANSCRIPTION. DNA à mrna TRANSCRIPTION DNA à mrna Central Dogma Animation DNA: The Secret of Life (from PBS) http://www.youtube.com/watch? v=41_ne5ms2ls&list=pl2b2bd56e908da696&index=3 Transcription http://highered.mcgraw-hill.com/sites/0072507470/student_view0/

More information

REGULATED SPLICING AND THE UNSOLVED MYSTERY OF SPLICEOSOME MUTATIONS IN CANCER

REGULATED SPLICING AND THE UNSOLVED MYSTERY OF SPLICEOSOME MUTATIONS IN CANCER REGULATED SPLICING AND THE UNSOLVED MYSTERY OF SPLICEOSOME MUTATIONS IN CANCER RNA Splicing Lecture 3, Biological Regulatory Mechanisms, H. Madhani Dept. of Biochemistry and Biophysics MAJOR MESSAGES Splice

More information

For all of the following, you will have to use this website to determine the answers:

For all of the following, you will have to use this website to determine the answers: For all of the following, you will have to use this website to determine the answers: http://blast.ncbi.nlm.nih.gov/blast.cgi We are going to be using the programs under this heading: Answer the following

More information

Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR)

Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR) Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR) O. Solomon, S. Oren, M. Safran, N. Deshet-Unger, P. Akiva, J. Jacob-Hirsch, K. Cesarkas, R. Kabesa, N. Amariglio, R.

More information

Polyomaviridae. Spring

Polyomaviridae. Spring Polyomaviridae Spring 2002 331 Antibody Prevalence for BK & JC Viruses Spring 2002 332 Polyoma Viruses General characteristics Papovaviridae: PA - papilloma; PO - polyoma; VA - vacuolating agent a. 45nm

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

Long non coding RNA in the pea aphid; iden3fica3on and compara3ve expression in sexual and asexual embryos

Long non coding RNA in the pea aphid; iden3fica3on and compara3ve expression in sexual and asexual embryos Long non coding RNA in the pea aphid; iden3fica3on and compara3ve expression in sexual and asexual embryos Fabrice Legeai, Thomas Derrien, Valen3n Wucher, Audrey David, Gael Le Trionnaire and Denis Tagu

More information

RNA-Seq guided gene therapy for vision loss. Michael H. Farkas

RNA-Seq guided gene therapy for vision loss. Michael H. Farkas RNA-Seq guided gene therapy for vision loss Michael H. Farkas The retina is a complex tissue Many cell types Neural retina vs. RPE Each highly dependent on the other Graw, Nature Reviews Genetics, 2003

More information

Acceptor splice site prediction

Acceptor splice site prediction Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 5-1-2007 Acceptor splice site prediction Eric Foster Follow this and additional works at: http://scholarworks.rit.edu/theses

More information

RNA (Ribonucleic acid)

RNA (Ribonucleic acid) RNA (Ribonucleic acid) Structure: Similar to that of DNA except: 1- it is single stranded polunucleotide chain. 2- Sugar is ribose 3- Uracil is instead of thymine There are 3 types of RNA: 1- Ribosomal

More information

Prediction of Alternative Splice Sites in Human Genes

Prediction of Alternative Splice Sites in Human Genes San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2007 Prediction of Alternative Splice Sites in Human Genes Douglas Simmons San Jose State University

More information

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological

More information

DMD Genetics: complicated, complex and critical to understand

DMD Genetics: complicated, complex and critical to understand DMD Genetics: complicated, complex and critical to understand Stanley Nelson, MD Professor of Human Genetics, Pathology and Laboratory Medicine, and Psychiatry Co Director, Center for Duchenne Muscular

More information

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Bas. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 18 th 2016 Protein alignments We have seen how to create a pairwise alignment of two sequences

More information

ITS accuracy at GenBank. Conrad Schoch Barbara Robbertse

ITS accuracy at GenBank. Conrad Schoch Barbara Robbertse ITS accuracy at GenBank Conrad Schoch Barbara Robbertse Improving accuracy Barcode tag in GenBank Barcode submission tool Standards RefSeq Targeted Loci Well validated sequences already in GenBank Bacteria

More information

Beta Thalassemia Case Study Introduction to Bioinformatics

Beta Thalassemia Case Study Introduction to Bioinformatics Beta Thalassemia Case Study Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu www.cs.sjsu.edu/faculty/khuri Outline v Hemoglobin v Alpha

More information

The Emergence of Alternative 39 and 59 Splice Site Exons from Constitutive Exons

The Emergence of Alternative 39 and 59 Splice Site Exons from Constitutive Exons The Emergence of Alternative 39 and 59 Splice Site Exons from Constitutive Exons Eli Koren, Galit Lev-Maor, Gil Ast * Department of Human Molecular Genetics, Sackler Faculty of Medicine, Tel Aviv University,

More information

Prediction and Statistical Analysis of Alternatively Spliced Exons

Prediction and Statistical Analysis of Alternatively Spliced Exons Prediction and Statistical Analysis of Alternatively Spliced Exons T.A. Thanaraj 1 and S. Stamm 2 The completion of large genomic sequencing projects revealed that metazoan organisms abundantly use alternative

More information

Where Splicing Joins Chromatin And Transcription. 9/11/2012 Dario Balestra

Where Splicing Joins Chromatin And Transcription. 9/11/2012 Dario Balestra Where Splicing Joins Chromatin And Transcription 9/11/2012 Dario Balestra Splicing process overview Splicing process overview Sequence context RNA secondary structure Tissue-specific Proteins Development

More information

Splice Site Prediction Using Artificial Neural Networks

Splice Site Prediction Using Artificial Neural Networks Splice Site Prediction Using Artificial Neural Networks Øystein Johansen, Tom Ryen, Trygve Eftesøl, Thomas Kjosmoen, and Peter Ruoff University of Stavanger, Norway Abstract. A system for utilizing an

More information

High-throughput transcriptome sequencing

High-throughput transcriptome sequencing High-throughput transcriptome sequencing Erik Kristiansson (erik.kristiansson@zool.gu.se) Department of Zoology Department of Neuroscience and Physiology University of Gothenburg, Sweden Outline Genome

More information

BIOLOGY 621 Identification of the Snorks

BIOLOGY 621 Identification of the Snorks Name: Date: Block: BIOLOGY 621 Identification of the Snorks INTRODUCTION: In this simulation activity, you will examine the DNA sequence of a fictitious organism - the Snork. Snorks were discovered on

More information

RNA-seq Introduction

RNA-seq Introduction RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different functional RNAs Which RNAs (and sometimes then translated

More information

Chapter 2 Gene and Promoter Structures of the Dopamine Receptors

Chapter 2 Gene and Promoter Structures of the Dopamine Receptors Chapter 2 Gene and Promoter Structures of the Dopamine Receptors Ursula M. D Souza Abstract The dopamine receptors have been classified into two groups, the D 1 - like and D 2 -like dopamine receptors,

More information

Supplementary Document

Supplementary Document Supplementary Document 1. Supplementary Table legends 2. Supplementary Figure legends 3. Supplementary Tables 4. Supplementary Figures 5. Supplementary References 1. Supplementary Table legends Suppl.

More information

Mechanisms of alternative splicing regulation

Mechanisms of alternative splicing regulation Mechanisms of alternative splicing regulation The number of mechanisms that are known to be involved in splicing regulation approximates the number of splicing decisions that have been analyzed in detail.

More information

Molecular Cell Biology - Problem Drill 10: Gene Expression in Eukaryotes

Molecular Cell Biology - Problem Drill 10: Gene Expression in Eukaryotes Molecular Cell Biology - Problem Drill 10: Gene Expression in Eukaryotes Question No. 1 of 10 1. Which of the following statements about gene expression control in eukaryotes is correct? Question #1 (A)

More information

Exploring the Connection between Sequence and Coordinated Gene Activity for Adjacent Promoter Pairs

Exploring the Connection between Sequence and Coordinated Gene Activity for Adjacent Promoter Pairs Exploring the Connection between Sequence and Coordinated Gene Activity for Adjacent Promoter Pairs Ameen Salem and Marc S. Halfon Department of Biochemistry NYS Center of Excellence Bioinformatics & Life

More information

The Basics: A general review of molecular biology:

The Basics: A general review of molecular biology: The Basics: A general review of molecular biology: DNA Transcription RNA Translation Proteins DNA (deoxy-ribonucleic acid) is the genetic material It is an informational super polymer -think of it as the

More information

Understanding genetics, mutation and other details. Stanley F. Nelson, MD 6/29/18

Understanding genetics, mutation and other details. Stanley F. Nelson, MD 6/29/18 Understanding genetics, mutation and other details Stanley F. Nelson, MD 6/29/18 1 6 11 16 21 Duchenne muscular dystrophy 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 600 500 400 300 200 100 0 Duchenne/Becker

More information

BCH Graduate Survey of Biochemistry

BCH Graduate Survey of Biochemistry BCH 5045 Graduate Survey of Biochemistry Instructor: Charles Guy Producer: Ron Thomas Director: Marsha Durosier Lecture 31 Slide sets available at: http://hort.ifas.ufl.edu/teach/guyweb/bch5045/index.html

More information