High-throughput transcriptome sequencing
|
|
- Candace Montgomery
- 6 years ago
- Views:
Transcription
1 High-throughput transcriptome sequencing Erik Kristiansson Department of Zoology Department of Neuroscience and Physiology University of Gothenburg, Sweden
2 Outline Genome sequencing High-throughput sequencing techniques Overview Massively parallel pyrosequencing Transcriptome sequencing Why? How? Example: The sequencing of the eelpout transcriptome Gene expression measurement using highthroughput sequencing
3 Genome sequencing Haemophilus influenzae, 1995 First sequenced free living organism 1800 genes, 1.8 million base pairs Saccharomyces cerevisiae, 1997 First sequenced eukaryote Genome consists of 6000 genes and 12 million base pairs 7 years of sequencing Homo sapiens, 2003 Genome consists of genes and 3.25 billion base pairs 13 years of sequencing
4 Genome sequencing Planned whole-genome sequencing projects for vertebrates Vertebrate species distribution
5 Genome sequencing Today more than 1000 species have been sequenced. However, the choice of species is biased. Many important taxonomic groups still lacks species with sequenced genome! Whole-genome sequencing is expensive. A coverage of ~10 times is needed for a reliable build. Polyploidy makes genome sequencing complicated. Species with more than two sets of chromosomes are very hard to sequence (e.g. zebrafish).
6 High-throughput sequencing Second generation sequencing technology Currently three major techniques on the market. Technology Company Throughput Cost/base Read length Parallel pyrosequencing 454 Life Sciences /Roche 20 million bases per hour 0.01$ bases Solexa Illumina 80 million bases per hour 0.005$ 35 bases SOLiD Applied Biosystems 100 million bases per hour 0.002$ 35 bases Traditional sequencing 0.05 million bases per hour 0.5$ up to 1000 bases
7 High-throughput sequencing Picture taken from Rothberg & Leamon, Nature Biotechnology 2008
8 Massively parallel pyrosequencing Advantages to Sanger sequencing High throughput More accurate less than 1% error rate (Huse et al. 2007) More sensitive high depth Disadvantages Limited sequence length ( bases) 454 Life Science pyrosequencer Year Technique Performance (bp/day) 2004 Traditional sequencing (Sanger) 1 million 2005 Parallel pyrosequencing GS20 70 million 2007 Parallel pyrosequencing GS FLX 200 million 2008 Parallel pyrosequencing GS FLX Titanium 500 million
9 Massively parallel pyrosequencing
10 Massively parallel pyrosequencing Nucleotides are flowed sequentially (a) A signal is generated for each nucleotide incorporation (b) A CCD camera is generating an image after each flow (c) The signal strength is proportional to the number of incorporated nucleotides.
11 Massively parallel pyrosequencing
12 Massively parallel pyrosequencing
13 Transcriptome sequencing Genome sequencing is expensive, even with high-throughput sequencing technology. The cost for a higher eukaryote is ~$10,000,000. In transcriptome sequencing only the transcribed protein coding parts of genome is sequenced.
14 Transcriptome sequencing More than 50% of the genome is estimated to be transcribed (some form). ~ 10% is estimated to be functional But only ~1.5% (!!!) of the human genome is protein coding. Whole-genome sequencing generates a lot of data that is not of primary interest.
15 Transcriptome sequencing To measure gene expression we need to know the sequence of the genes. Microarray probes PCR primers Only the transcriptome need. The complete genome not necessary. However, traditional transcriptome sequencing (ESTs) Expensive Error prone Not deep enough There exists ~60 million ESTs for eukaryotes. 27 million of these are for vertebrates.
16 De novo transcriptome sequencing using massively parallel pyrosequencing Genome Transcripts ~ 3000 bp Transcriptome Massively parallel pyrosequencing ~ 250 bp
17 How much data do we get? The result from one run on a Genome Sequencer FLX reads 400 bases bases The transcriptome of a higher eukaryote is up to 50 million bases. We can, theoretically, cover this transcriptome 5 times. However, the limited read length will have an negative effect!
18 How much data do we get? Gene Reads from sequencing How many data do we need to remove all gaps?
19 How much data do we get?
20 Sequence data processing >E6LSDQW02HPHGI TGACTAAGATCCATCACATCAGGCCAGGTAGGAGTCTCTTATATTAGGTATCAATACCTTCCGGGT GGATACCTTTGAGGCATAAGCTGGACAGGCACAGAACCTCGAGGCAGAACTTCCCGACTGCTTGAT GTGTATCAAGGTCAATCAATCTGAAAATCAGCTGCCTAAGCACCAGTTCAAAAAAAAAAAAGAATA TTTGCTCAACTCCTCTTAGTAGCTGAGCGGGCTGGCAAGGC >E5R7OVD09FMUGM TGACTAACTGTAGACACACAACACATCAACACACACACACACACACACACACACACACACACACAC TAGANACACACACACACACACACACACACACACACACACACACACTACTATAATAAATAAAGAAGA AGAAGTAGTTAGTTAGTACTTAACGTTAACGGTACGGTACGTAGGTACGGTAACCGGTAACCGGTA ACCCGGAACCGTACGGTACGGTCGGTACGGTACGGTACGGTACGTACGTAACCGTTAAAAACCGGT TTAAAAAGGTAAAAAGGGTAAAAAGGGTTAAAACGGGTTAACGGGTAACGTAGTAGNA >E6LSDQW02GGDBU TGACTAACAAATTTTAATTACACTTAAGGTGTATATTTTCTATGCAACCCATCAATTCAAGAGGTG TAATGTGCTGATGACTATTTGTAATCGTTATACATTCTGACCCGAAGTCAGAAAGTATTTCTCTGT CTGTGTGTTCACAGGCAGTGTGGTTGATTACATGAAATTCAGTACATTTGCAGTCTCGTTGCCCTT CTCACCTGCCTTTCGTCATTACCGACGGTATTGAATTTCGTTTTCCCCGTTGGGGTTCTCCGGACA AGGAG
21 Sequences producing significant alignments: (bits) Value Ecoligenome 519 e-148 >Ecoligenome Length = Score = 519 bits (262), Expect = e-148 Identities = 262/262 (100%) Strand = Plus / Minus Query: 8 caaattttaattacacttaaggtgtatattttctatgcaacccatcaattcaagaggtgt 67 Sbjct: caaattttaattacacttaaggtgtatattttctatgcaacccatcaattcaagaggtgt Query: 68 aatgtgctgatgactatttgtaatcgttatacattctgacccgaagtcagaaagtatttc 127 Sbjct: aatgtgctgatgactatttgtaatcgttatacattctgacccgaagtcagaaagtatttc Query: 128 tctgtctgtgtgttcacaggcagtgtggttgattacatgaaattcagtacatttgcagtc 187 Sbjct: tctgtctgtgtgttcacaggcagtgtggttgattacatgaaattcagtacatttgcagtc Query: 188 tcgttgcccttctcacctgcctttcgtcattaccgacggtattgaatttcgttttccccg 247 Sbjct: tcgttgcccttctcacctgcctttcgtcattaccgacggtattgaatttcgttttccccg Query: 248 ttggggttctccggacaaggag 269 Sbjct: ttggggttctccggacaaggag
22 Sequence cleaning Removal of undesirable sequences which may disturb sequence assembly Better safe than sorry low complexity regions contains very little information Tags from 454 sequencing A tag TGACTAA B tag TTAGTAG The tags removed by pattern matching
23 Contamination mrna from other types of species rrna or other unwanted types of RNA Repetitive elements polya-tails Sequence cleaning Simple Sequence Repeats (SSR) More complex repeats like SINEs, LINEs and transposons Repetitive elements are typically contained in the untranslated regions (UTRs)
24 Sequence cleaning RepeatMasker is a tool for identification of repetitive elements ab initio prediction of repeats database matching Repbase Update is a database with Transposable elements Simple Sequence Repeats Pseudogenes
25 Assembly True transcript Reads from sequencing Assembled sequences Contigs and singlets Similarity threshold Less strict setting results in longer contigs with more errors More strict setting results in shorter contigs with fewer errors
26 The Gene Indices Clustering Tools Reads Clusters blastclust CAP3 Contigs
27 Functional similarity from sequence similarity Assign information to the assembled transcripts Gene description Annotation Functional annotation (e.g. Gene Ontology, pathways, etc.) Homology and interactions GenBank UniProt ensembl
28 Direction of transcription Massively parallel pyrosequencing ignores the direction of transcription Correct direction of transcription is however crucial for measuring gene expression AATTTTTCGATCTCCCTGCAAGACGGCTCATTT 5 3 I I G S V S V S E G L AATATAACATCACCTGCAAATTTTTCGATCTCCCTGCAAGACGGCTCATTTGGCTCATAAC TTATATTGTAGTGGACGTTTAAAAAGCTAGAGGGACGTTCTGCCGAGTAAACCGAGTATTG 3 5
29 BLAST test both strands and reports the best >O42430 CP1A1_LIMLI Cytochrome P450 1A1 - Limanda limanda (Dab) Length = 521 Minus Strand HSPs: Score = 2347 (908.7 bits), Expect = 1.7e-266, P = 1.7e-266 Identities = 440/520 (84%), Positives = 478/520 (91%), Frame = -3 Query: 1593 MVLTILPFIGPVSVSESLVAMTTLCLVYLIFKFFHTDIXXXXXXXXXXXXXXXXXNVLEV 1414 M+L +LPFIG VSVSESLVAMTT+CLVYLI KFF T+I NVLE+ Sbjct: 1 MMLMMLPFIGSVSVSESLVAMTTVCLVYLILKFFQTEIPEGLRRLPGPKPLPIIGNVLEM 60 Query: 1413 GSRPYLSLTAMSKRYGNIFQIQIGMRPVVVLSGSDTLRQALIKQGDDFAGRPDLYSFRLI 1234 GSRPYLSLTAMSKRYGN+FQIQIGMRPVVVLSGS+T+RQALIKQGDDFAGRPDLYSFR I Sbjct: 61 GSRPYLSLTAMSKRYGNVFQIQIGMRPVVVLSGSETVRQALIKQGDDFAGRPDLYSFRFI 120 FrameFinder from the estate software suite ab initio prediction of protein coding regions Returns location for protein coding regions and the predicted protein
30 Case study: Sequencing of the transcriptome of Zoarces viviparus The BALCOFISH project No suitable model species Zoarces viviparus (eelpout) Stationary Gives birth to live young Large-scale gene expression assays in eelpout Sequencing of the liver transcriptome Design of an eelpout microarray
31 The analysis pipeline 1. Pre-processing Removes: Redudant 454-reads (ghost reads) 5 /3 sequencing vectors 2. SeqClean Removes: PolyA-tails Simple repeats Bacterial contamination 4. Assembly Contig assembly using megablast and CAP3. 3. RepeatMasker Masks/removes: Low-complexity regions (Transposons, etc.) Contamination (rrna, etc)
32 Assembly results and statistics Massively parallel pyrosequencing on a GS FLX reads with an average length of 237 bases 90 million bases in total Contigs Singlets Total Number of sequences 36,110 17,347 53,457 Number of bases 14,250,156 4,050,061 18,300,217 Average length Average coverage Annotated 89.2% 87.3% 88.6%
33
34
35
36 Assembly results and statistics The 18 million bases covers ~40% of the total eelpout transcriptome Matches ~8,000 genes in stickleback Few stickleback genes are represented only by eelpout singlets.
37 Pyrosequencing Genbank Gene Accession Length Accession Length Vitellogenin ZOVI ,826 AJ ,229 Zona Pelucida 2 ZOVI , Zona Pelucida 3 ZOVI Estrogen receptor ZOVI AY ,256 Metallothionein ZOVI X Heat-shock protein 70 ZOVI , Heat-shock protein 90 ZOVI Cytochrome P450 1A ZOVI , Superoxide dismutase ZOVI Glutathione peroxidase ZOVI ,
38 Novel genes in the eelpout data? Previous studies report ~10% transcripts from regions without annotation (e.g. A. thaliana, C. elegans) Alignment against five fish genomes 19,000 transcripts aligned in three out of five 4% of these aligned outside annotated regions 717 base pairs
39 Randomized regions Transcripts inside annotated regions Transcripts outside annotated regions
40 Measuring gene expression using high-throughput sequencing High-throughput sequencing can be used to measure gene expression for species with known genomes 1. Sequence the transcriptome 2. Count the number of times each gene appears Advantages Low technical noise No cross-hybridization Disadvantages Many reads are needed to measure low expressed genes Expensive Fully sequenced genome more important
41 Measuring gene expression using high-throughput sequencing The correlation between high-throughput sequencing and microarrays is between 50-80% Data for all sequencing techniques is still missing. t Hoen et al Correlation is ~60% t Hoen et al Correlation is ~70%
42 Gene expression measurement with highthroughput sequencing Correlation for annotated transcripts was ~60% Kristiansson et al Characterization of the Zoarces viviparus transcriptome using massively parallel pryosequencing.
43 Gene expression measurement with highthroughput sequencing High-throughput sequencing can be used to measure gene expression for species with known genomes 1. Sequence the transcriptome 2. Count the number of times each gene appears Advantages Low technical noise No cross-hybridization Disadvantages Many reads are needed to measure low expressed genes Expensive Fully sequenced genome more important
44 Gene expression measurement with highthroughput sequencing The correlation between high-throughput sequencing and microarrays is between 50-80% Data for all sequencing techniques is still missing. t Hoen et al Correlation is ~60% t Hoen et al Correlation is ~70%
45 Gene expression measurement with high-throughput sequencing Correlation for annotated transcripts was ~60% Kristiansson et al Characterization of the Zoarces viviparus transcriptome using massively parallel pryosequencing.
46 Summary The second generation sequencing techniques can generate vasts amount of sequence data. Illumina and SOLiD sequencing can generate more data than massively parallel pyrosequencing but with shorter reads. Massively parallel pyrosequencing can be used for de novo transcriptome sequencing. One run is enough to assemble large parts of the transcriptome of a higher eukaryote. Gene expression measurements using highthroughput sequencing has both advantages and disadvantages compared to microarrays. The correlation is around 60-70%.
genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)
RNA sequencing (RNA-seq) Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome
More informationBioinformatics Laboratory Exercise
Bioinformatics Laboratory Exercise Biology is in the midst of the genomics revolution, the application of robotic technology to generate huge amounts of molecular biology data. Genomics has led to an explosion
More informationPhylogenomics. Antonis Rokas Department of Biological Sciences Vanderbilt University.
Phylogenomics Antonis Rokas Department of Biological Sciences Vanderbilt University http://as.vanderbilt.edu/rokaslab High-Throughput DNA Sequencing Technologies 454 / Roche 450 bp 1.5 Gbp / day Illumina
More informationSpliceDB: database of canonical and non-canonical mammalian splice sites
2001 Oxford University Press Nucleic Acids Research, 2001, Vol. 29, No. 1 255 259 SpliceDB: database of canonical and non-canonical mammalian splice sites M.Burset,I.A.Seledtsov 1 and V. V. Solovyev* The
More informationSebastian Jaenicke. trnascan-se. Improved detection of trna genes in genomic sequences
Sebastian Jaenicke trnascan-se Improved detection of trna genes in genomic sequences trnascan-se Improved detection of trna genes in genomic sequences 1/15 Overview 1. trnas 2. Existing approaches 3. trnascan-se
More informationVirusDetect pipeline - virus detection with small RNA sequencing
VirusDetect pipeline - virus detection with small RNA sequencing CSC webinar 16.1.2018 Eija Korpelainen, Kimmo Mattila, Maria Lehtivaara Big thanks to Jan Kreuze and Jari Valkonen! Outline Small interfering
More informationHao D. H., Ma W. G., Sheng Y. L., Zhang J. B., Jin Y. F., Yang H. Q., Li Z. G., Wang S. S., GONG Ming*
Comparison of transcriptomes and gene expression profiles of two chilling- and drought-tolerant and intolerant Nicotiana tabacum varieties under low temperature and drought stress Hao D. H., Ma W. G.,
More informationRNA-seq Introduction
RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different functional RNAs Which RNAs (and sometimes then translated
More informationMODULE 3: TRANSCRIPTION PART II
MODULE 3: TRANSCRIPTION PART II Lesson Plan: Title S. CATHERINE SILVER KEY, CHIYEDZA SMALL Transcription Part II: What happens to the initial (premrna) transcript made by RNA pol II? Objectives Explain
More informationAnalysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers
Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies
More informationGene finding. kuobin/
Gene finding KUO-BIN LI, PH.D. http://www.bii.a-star.edu.sg/ kuobin/ Bioinformatics Institute 30 Medical Drive, Level 1, IMCB Building Singapore 117609 Republic of Singapore Gene finding (LSM5191) p.1
More informationDETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester
DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING John Archer Faculty of Life Sciences University of Manchester HIV Dynamics and Evolution, 2008, Santa Fe, New Mexico. Overview
More informationGenerating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University
Role of Chemical lexposure in Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University CNV Discovery Reference Genetic
More informationTranscriptome Analysis
Transcriptome Analysis Data Preprocessing Sample Preparation Illumina Sequencing Demultiplexing Raw FastQ Reference Genome (fasta) Reference Annotation (GTF) Reference Genome Analysis Tophat Accepted hits
More informationAnnotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009
Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 1 Abstract A stretch of chimpanzee DNA was annotated using tools including BLAST, BLAT, and Genscan. Analysis of Genscan predicted genes revealed
More informationNature Biotechnology: doi: /nbt.1904
Supplementary Information Comparison between assembly-based SV calls and array CGH results Genome-wide array assessment of copy number changes, such as array comparative genomic hybridization (acgh), is
More informationAnnotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W
Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W 5.1.2007 Overview High-quality finished sequence is much more useful for research once it is annotated. Annotation is a fundamental
More informationFor all of the following, you will have to use this website to determine the answers:
For all of the following, you will have to use this website to determine the answers: http://blast.ncbi.nlm.nih.gov/blast.cgi We are going to be using the programs under this heading: Answer the following
More informationRNA- seq Introduc1on. Promises and pi7alls
RNA- seq Introduc1on Promises and pi7alls DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different func1onal RNAs Which RNAs (and some1mes
More informationP. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.
Databases and Tools for High Throughput Sequencing Analysis P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University. HTseq Platforms Applications on Biomedical Sciences
More informationFigure 1: Final annotation map of Contig 9
Introduction With rapid advances in sequencing technology, particularly with the development of second and third generation sequencing, genomes for organisms from all kingdoms and many phyla have been
More informationThe Blueprint of Life: DNA to Protein. What is genetics? DNA Structure 4/27/2011. Chapter 7
The Blueprint of Life: NA to Protein Chapter 7 What is genetics? The science of heredity; includes the study of genes, how they carry information, how they are replicated, how they are expressed NA Structure
More informationThe Blueprint of Life: DNA to Protein
The Blueprint of Life: NA to Protein Chapter 7 What is genetics? The science of heredity; includes the y; study of genes, how they carry information, how they are replicated, how they are expressed 1 NA
More informationAmbient temperature regulated flowering time
Ambient temperature regulated flowering time Applications of RNAseq RNA- seq course: The power of RNA-seq June 7 th, 2013; Richard Immink Overview Introduction: Biological research question/hypothesis
More informationAdvance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library
Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Marilou Wijdicks International Product Manager Research For Life Science Research Only. Not for Use in Diagnostic Procedures.
More informationIdentification of mirnas in Eucalyptus globulus Plant by Computational Methods
International Journal of Pharmaceutical Science Invention ISSN (Online): 2319 6718, ISSN (Print): 2319 670X Volume 2 Issue 5 May 2013 PP.70-74 Identification of mirnas in Eucalyptus globulus Plant by Computational
More informationNEXT GENERATION SEQUENCING. R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca
NEXT GENERATION SEQUENCING R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca SANGER SEQUENCING 5 3 3 5 + Capillary Electrophoresis DNA NEXT GENERATION SEQUENCING SOLEXA-ILLUMINA
More informationAVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits
AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits Accelerating clinical research Next-generation sequencing (NGS) has the ability to interrogate many different genes and detect
More informationIso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing
Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing PacBio Americas User Group Meeting Sample Prep Workshop June.27.2017 Tyson Clark, Ph.D. For Research Use Only. Not
More informationIntroduction retroposon
17.1 - Introduction A retrovirus is an RNA virus able to convert its sequence into DNA by reverse transcription A retroposon (retrotransposon) is a transposon that mobilizes via an RNA form; the DNA element
More informationStructural Variation and Medical Genomics
Structural Variation and Medical Genomics Andrew King Department of Biomedical Informatics July 8, 2014 You already know about small scale genetic mutations Single nucleotide polymorphism (SNPs) Deletions,
More informationChIP-seq data analysis
ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017 Contents Background ChIP-seq protocol ChIP-seq data analysis Transcriptional regulation Transcriptional
More informationThe Open Bioinformatics Journal, 2014, 8, 1-5 1
Send Orders for Reprints to reprints@benthamscience.net The Open Bioinformatics Journal, 214, 8, 1-5 1 Open Access Performances of Bioinformatics Pipelines for the Identification of Pathogens in Clinical
More informationPrediction of micrornas and their targets
Prediction of micrornas and their targets Introduction Brief history mirna Biogenesis Computational Methods Mature and precursor mirna prediction mirna target gene prediction Summary micrornas? RNA can
More informationHOST-PARASITE INTERPLAY
HOST-PARASITE INTERPLAY Adriano Casulli EURLP, ISS (Rome, Italy) HOST-PARASITE INTERPLAY WP3 (parasite virulence vs human immunity) (Parasite) Task 3.1: Genotypic characterization Task 3.6: Transcriptome
More informationIDENTIFICATION OF IN SILICO MIRNAS IN FOUR PLANT SPECIES FROM FABACEAE FAMILY
Original scientific paper 10.7251/AGRENG1803122A UDC633:34+582.736.3]:577.2 IDENTIFICATION OF IN SILICO MIRNAS IN FOUR PLANT SPECIES FROM FABACEAE FAMILY Bihter AVSAR 1*, Danial ESMAEILI ALIABADI 2 1 Sabanci
More informationITS accuracy at GenBank. Conrad Schoch Barbara Robbertse
ITS accuracy at GenBank Conrad Schoch Barbara Robbertse Improving accuracy Barcode tag in GenBank Barcode submission tool Standards RefSeq Targeted Loci Well validated sequences already in GenBank Bacteria
More informationComputer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015
Goals/Expectations Computer Science, Biology, and Biomedical (CoSBBI) We want to excite you about the world of computer science, biology, and biomedical informatics. Experience what it is like to be a
More informationFINAL ANNOTATION REPORT: Drosophila virilis Fosmid 11 (48P14) Robert Carrasquillo Bio 4342
FINAL ANNOTATION REPORT: Drosophila virilis Fosmid 11 (48P14) Robert Carrasquillo Bio 4342 2006 TABLE OF CONTENTS I. Overview... 3 II. Genes... 4 III. Clustal Analysis... 15 IV. Repeat Analysis... 17 V.
More informationABS04. ~ Inaugural Applied Bayesian Statistics School EXPRESSION
ABS04-2004 Applied Bayesian Statistics School STATISTICS & GENE EXPRESSION GENOMICS: METHODS AND COMPUTATIONS Mike West Duke University Centro Congressi Panorama, Trento,, Italy 15th-19th 19th June 2004
More informationBreast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data
Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing
More informationProtein Synthesis
Protein Synthesis 10.6-10.16 Objectives - To explain the central dogma - To understand the steps of transcription and translation in order to explain how our genes create proteins necessary for survival.
More informationTranscriptome and isoform reconstruc1on with short reads. Tangled up in reads
Transcriptome and isoform reconstruc1on with short reads Tangled up in reads Topics of this lecture Mapping- based reconstruc1on methods Case study: The domes1c dog De- novo reconstruc1on method Trinity
More informationIdentification of both copy number variation-type and constant-type core elements in a large segmental duplication region of the mouse genome
Identification of both copy number variation-type and constant-type core elements in a large segmental duplication region of the mouse genome Umemori et al. Umemori et al. BMC Genomics 2013, 14:455 Umemori
More informationHands-On Ten The BRCA1 Gene and Protein
Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such
More informationGlobal Epigenetic and Transcriptional Trends among Two Rice Subspecies and Their Reciprocal Hybrids W
The Plant Cell, Vol. 22: 17 33, January 2010, www.plantcell.org ã 2010 American Society of Plant Biologists RESEARCH ARTICLES Global Epigenetic and Transcriptional Trends among Two Rice Subspecies and
More informationCircular RNAs (circrnas) act a stable mirna sponges
Circular RNAs (circrnas) act a stable mirna sponges cernas compete for mirnas Ancestal mrna (+3 UTR) Pseudogene RNA (+3 UTR homolgy region) The model holds true for all RNAs that share a mirna binding
More informationSupplemental Materials and Methods Plasmids and viruses Quantitative Reverse Transcription PCR Generation of molecular standard for quantitative PCR
Supplemental Materials and Methods Plasmids and viruses To generate pseudotyped viruses, the previously described recombinant plasmids pnl4-3-δnef-gfp or pnl4-3-δ6-drgfp and a vector expressing HIV-1 X4
More informationMolecular Biology (BIOL 4320) Exam #2 May 3, 2004
Molecular Biology (BIOL 4320) Exam #2 May 3, 2004 Name SS# This exam is worth a total of 100 points. The number of points each question is worth is shown in parentheses after the question number. Good
More informationEukaryotic small RNA Small RNAseq data analysis for mirna identification
Eukaryotic small RNA Small RNAseq data analysis for mirna identification P. Bardou, C. Gaspin, S. Maman, J. Mariette, O. Rué, M. Zytnicki INRA Sigenae Toulouse INRA MIA Toulouse GenoToul Bioinfo INRA MaIAGE
More informationRASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays
Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,
More informationMicroRNA in Cancer Karen Dybkær 2013
MicroRNA in Cancer Karen Dybkær RNA Ribonucleic acid Types -Coding: messenger RNA (mrna) coding for proteins -Non-coding regulating protein formation Ribosomal RNA (rrna) Transfer RNA (trna) Small nuclear
More informationCytogenetics 101: Clinical Research and Molecular Genetic Technologies
Cytogenetics 101: Clinical Research and Molecular Genetic Technologies Topics for Today s Presentation 1 Classical vs Molecular Cytogenetics 2 What acgh? 3 What is FISH? 4 What is NGS? 5 How can these
More informationMODULE 4: SPLICING. Removal of introns from messenger RNA by splicing
Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by
More informationGenomic structural variation
Genomic structural variation Mario Cáceres The new genomic variation DNA sequence differs across individuals much more than researchers had suspected through structural changes A huge amount of structural
More informationEST alignments suggest that [secret number]% of Arabidopsis thaliana genes are alternatively spliced
EST alignments suggest that [secret number]% of Arabidopsis thaliana genes are alternatively spliced Dan Morris Stanford University Robotics Lab Computer Science Department Stanford, CA 94305-9010 dmorris@cs.stanford.edu
More informationSelective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples
DNA CLONING DNA AMPLIFICATION & PCR EPIGENETICS RNA ANALYSIS Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples LIBRARY
More information38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16
38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and
More informationComputational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq
Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological
More informationPhenomena first observed in petunia
Vectors for RNAi Phenomena first observed in petunia Attempted to overexpress chalone synthase (anthrocyanin pigment gene) in petunia. (trying to darken flower color) Caused the loss of pigment. Bill Douherty
More informationSupplementary methods:
Supplementary methods: Primers sequences used in real-time PCR analyses: β-actin F: GACCTCTATGCCAACACAGT β-actin [11] R: AGTACTTGCGCTCAGGAGGA MMP13 F: TTCTGGTCTTCTGGCACACGCTTT MMP13 R: CCAAGCTCATGGGCAGCAACAATA
More informationAnalysis and characterization of the repetitive sequences of T. aestivum chromosome 4D
Analysis and characterization of the repetitive sequences of T. aestivum chromosome 4D Romero J.R., Garbus, I., Helguera M., Tranquilli G., Paniego N., Caccamo M., Valarik M., Simkova H., Dolezel J., Echenique
More informationMATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data
Nucleic Acids Research Advance Access published February 1, 2012 Nucleic Acids Research, 2012, 1 13 doi:10.1093/nar/gkr1291 MATS: a Bayesian framework for flexible detection of differential alternative
More informationMegan Smedinghoff, Advisor: James A. Yorke, October 11, 2007
Improving the Draft Assembly of the Horse Genome Megan Smedinghoff, smeds@umd.edu Advisor: James A. Yorke, yorke@umd.edu October 11, 2007 Abstract I aim to improve the draft genome of the horse that was
More informationp53 cooperates with DNA methylation and a suicidal interferon response to maintain epigenetic silencing of repeats and noncoding RNAs
p53 cooperates with DNA methylation and a suicidal interferon response to maintain epigenetic silencing of repeats and noncoding RNAs 2013, Katerina I. Leonova et al. Kolmogorov Mikhail Noncoding DNA Mammalian
More informationSupplementary Figures and Tables
Supplementary Figures and Tables Supplementary Figure 1. Study design and sample collection. S.japonicum were harvested from C57 mice at 8 time points after infection. Total number of samples for RNA-Seq:
More informationNot IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014
Not IN Our Genes - A Different Kind of Inheritance! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Epigenetics in Mainstream Media Epigenetics *Current definition:
More informationTable S1. Relative abundance of AGO1/4 proteins in different organs. Table S2. Summary of smrna datasets from various samples.
Supplementary files Table S1. Relative abundance of AGO1/4 proteins in different organs. Table S2. Summary of smrna datasets from various samples. Table S3. Specificity of AGO1- and AGO4-preferred 24-nt
More informationUnderstanding Gallibacterium-Associated Peritonitis in the Commercial Egg-Laying Industry
Understanding Gallibacterium-Associated Peritonitis in the Commercial Egg-Laying Industry Timothy J. Johnson A, Lisa K. Nolan B, and Darrell W. Trampel C A University of Minnesota, Department of Veterinary
More informationGENOME-WIDE COMPUTATIONAL ANALYSIS OF SMALL NUCLEAR RNA GENES OF ORYZA SATIVA (INDICA AND JAPONICA)
GENOME-WIDE COMPUTATIONAL ANALYSIS OF SMALL NUCLEAR RNA GENES OF ORYZA SATIVA (INDICA AND JAPONICA) M.SHASHIKANTH, A.SNEHALATHARANI, SK. MUBARAK AND K.ULAGANATHAN Center for Plant Molecular Biology, Osmania
More informationAlternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins.
Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins. The RNA transcribed from a complex transcription unit
More informationVariant Classification. Author: Mike Thiesen, Golden Helix, Inc.
Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets
More informationWhole genome sequencing & new strain typing methods in IPC. Lyn Gilbert ACIPC conference Hobart, November 2015
Whole genome sequencing & new strain typing methods in IPC Lyn Gilbert ACIPC conference Hobart, November 2015 Why do strain typing? Evolution, population genetics, geographic distribution 2 Why strain
More informationSupplementary Figures. Supplementary Figure 1. Treatment schematic of SIV infection and ARV and PP therapies.
Supplementary Figures Supplementary Figure 1. Treatment schematic of SIV infection and ARV and PP therapies. Supplementary Figure 2. SIV replication and CD4 + T cell count. (A) Log SIVmac239 copies/ml
More informationHow to Standardise and Assemble Raw Data into Sequences: What Does it Mean for a Laboratory to Use Such Technologies?"
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean for a Laboratory to Use Such Technologies?" Dr Joseph Hughes 11th OIE Seminar Saskatoon - 17th June 2015 Cost per raw Megabase
More informationHuman Genome Complexity, Viruses & Genetic Variability
Human Genome Complexity, Viruses & Genetic Variability (Learning Objectives) Learn the types of DNA sequences present in the Human Genome other than genes coding for functional proteins. Review what you
More informationBioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute
Bioinformatics Sequence Analysis: Part III. Pattern Searching and Gene Finding Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18
More informationA Comparison of Next Generation Sequencing Technologies for Transcriptome Assembly and Utility for RNA-Seq in a Non-Model Bird
A Comparison of Next Generation Sequencing Technologies for Transcriptome Assembly and Utility for RNA-Seq in a Non-Model Bird Findley R. Finseth*, Richard G. Harrison Department of Ecology and Evolutionary
More informationLESSON 4.4 WORKBOOK. How viruses make us sick: Viral Replication
DEFINITIONS OF TERMS Eukaryotic: Non-bacterial cell type (bacteria are prokaryotes).. LESSON 4.4 WORKBOOK How viruses make us sick: Viral Replication This lesson extends the principles we learned in Unit
More informationIntroduction to Systems Biology of Cancer Lecture 2
Introduction to Systems Biology of Cancer Lecture 2 Gustavo Stolovitzky IBM Research Icahn School of Medicine at Mt Sinai DREAM Challenges High throughput measurements: The age of omics Systems Biology
More informationA putative MYB35 ortholog is a candidate for the sex-determining genes in Asparagus
Supplementary figures for: A putative MYB35 ortholog is a candidate for the sex-determining genes in Asparagus officinalis Daisuke Tsugama, Kohei Matsuyama, Mayui Ide, Masato Hayashi, Kaien Fujino, and
More informationPROTEIN SYNTHESIS. It is known today that GENES direct the production of the proteins that determine the phonotypical characteristics of organisms.
PROTEIN SYNTHESIS It is known today that GENES direct the production of the proteins that determine the phonotypical characteristics of organisms.» GENES = a sequence of nucleotides in DNA that performs
More informationDr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.
Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Topics Overview of Data Processing Pipeline Overview of Data Files 2 DNA Nano-Ball (DNB) Read Structure Genome : acgtacatgcattcacacatgcttagctatctctcgccag
More informationTITLE: The Role Of Alternative Splicing In Breast Cancer Progression
AD Award Number: W81XWH-06-1-0598 TITLE: The Role Of Alternative Splicing In Breast Cancer Progression PRINCIPAL INVESTIGATOR: Klemens J. Hertel, Ph.D. CONTRACTING ORGANIZATION: University of California,
More informationGlobal regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR)
Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR) O. Solomon, S. Oren, M. Safran, N. Deshet-Unger, P. Akiva, J. Jacob-Hirsch, K. Cesarkas, R. Kabesa, N. Amariglio, R.
More informationGenetics. Instructor: Dr. Jihad Abdallah Transcription of DNA
Genetics Instructor: Dr. Jihad Abdallah Transcription of DNA 1 3.4 A 2 Expression of Genetic information DNA Double stranded In the nucleus Transcription mrna Single stranded Translation In the cytoplasm
More informationSUPPLEMENTARY INFORMATION
doi: 1.138/nature8645 Physical coverage (x haploid genomes) 11 6.4 4.9 6.9 6.7 4.4 5.9 9.1 7.6 125 Neither end mapped One end mapped Chimaeras Correct Reads (million ns) 1 75 5 25 HCC1187 HCC1395 HCC1599
More informationSimple, rapid, and reliable RNA sequencing
Simple, rapid, and reliable RNA sequencing RNA sequencing applications RNA sequencing provides fundamental insights into how genomes are organized and regulated, giving us valuable information about the
More informationAlper Sarikaya 1, Michael Correll 2, Jorge M. Dinis 1, David H. O Connor 1,3, and Michael Gleicher 1
Alper Sarikaya 1, Michael Correll 2, Jorge M. Dinis 1, David H. O Connor 1,3, and Michael Gleicher 1 1 University of Wisconsin-Madison 2 University of Washington 3 Wisconsin National Primate Center @yelperalp
More informationCNV Detection and Interpretation in Genomic Data
CNV Detection and Interpretation in Genomic Data Benjamin W. Darbro, M.D., Ph.D. Assistant Professor of Pediatrics Director of the Shivanand R. Patil Cytogenetics and Molecular Laboratory Overview What
More informationDiscovery of a Novel Murine Type C Retrovirus by Data Mining
JOURNAL OF VIROLOGY, Mar. 2001, p. 3053 3057 Vol. 75, No. 6 0022-538X/01/$04.00 0 DOI: 10.1128/JVI.75.6.3053 3057.2001 Copyright 2001, American Society for Microbiology. All Rights Reserved. Discovery
More informationWorkshop Cum Training on Biological Data Analysis Through Computational. Inside.. About Us
Hash Bio- 1 Bioinformatics up to Date (Bioinformatics Infrastructure Facility, Biotechnology Division) North-East Institute of Science & Technology Jorhat - 785 006, Assam Volume 8, Issue 12 December 2015
More informationSupplementary Figure 1. High-affinity methane oxidation (HAMO) dynamics of soils with added methane at ppmv for 1 time and 10 times.
Supplementary Figure 1. High-affinity methane oxidation () dynamics of soils with added methane at 10000 ppmv for 1 time and 10 times. After the complete consumption of 10000 ppmv methane, the measurement
More informationGENOME-WIDE DETECTION OF ALTERNATIVE SPLICING IN EXPRESSED SEQUENCES USING PARTIAL ORDER MULTIPLE SEQUENCE ALIGNMENT GRAPHS
GENOME-WIDE DETECTION OF ALTERNATIVE SPLICING IN EXPRESSED SEQUENCES USING PARTIAL ORDER MULTIPLE SEQUENCE ALIGNMENT GRAPHS C. GRASSO, B. MODREK, Y. XING, C. LEE Department of Chemistry and Biochemistry,
More informationLong non-coding RNAs
Long non-coding RNAs Dominic Rose Bioinformatics Group, University of Freiburg Bled, Feb. 2011 Outline De novo prediction of long non-coding RNAs (lncrnas) Genome-wide RNA gene-finding Intrinsic properties
More informationContents. Introduction. Helminths. Genomics. APOLLO: gene curation software. Glossary. Further Sources
Contents 1 Introduction 3 Helminths 9 Genomics 13 APOLLO: gene curation software 18 Glossary 19 Further Sources Introduction Introduction Project overview The Institute for Research in Schools (IRIS) offers
More informationRNA SEQUENCING AND DATA ANALYSIS
RNA SEQUENCING AND DATA ANALYSIS Download slides and package http://odin.mdacc.tmc.edu/~rverhaak/package.zip http://odin.mdacc.tmc.edu/~rverhaak/rna-seqlecture.zip Overview Introduction into the topic
More informationBelow, we included the point-to-point response to the comments of both reviewers.
To the Editor and Reviewers: We would like to thank the editor and reviewers for careful reading, and constructive suggestions for our manuscript. According to comments from both reviewers, we have comprehensively
More informationProfiles of gene expression & diagnosis/prognosis of cancer. MCs in Advanced Genetics Ainoa Planas Riverola
Profiles of gene expression & diagnosis/prognosis of cancer MCs in Advanced Genetics Ainoa Planas Riverola Gene expression profiles Gene expression profiling Used in molecular biology, it measures the
More informationHigh AU content: a signature of upregulated mirna in cardiac diseases
https://helda.helsinki.fi High AU content: a signature of upregulated mirna in cardiac diseases Gupta, Richa 2010-09-20 Gupta, R, Soni, N, Patnaik, P, Sood, I, Singh, R, Rawal, K & Rani, V 2010, ' High
More information