RNA sequencing (RNA-seq)
Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome across tissues and individuals. WE 22-Mar-2017 RNA sequencing: Models for RNA-sequencing Data MO 27-Mar-2017 Paper: PMID 26813401: A survey of best practices for RNA-seq data analysis WE 29-Mar-2017 Paper: PMID 28049689: Understanding development and stem cells using single cell-based analyses of gene expression
High Throughput Technologies Measuring things in parallel, for example: Northern blot: Expression of one gene. Microarray: Expression of all known genes. NGS (e.g. RNASeq): The transcriptome. NGS: Next Generation Sequencing: Generates thousands of sequences in parallel Underlies many applications, not only RNASeq Dramatically reduced cost
Next Generation Sequencing - Sequencing target (e.g. a genome) - Library preparation (includes fragmentation) - Single vs. paired end - Different technologies / platforms Differ in: - Read length - Accuracy - Cost Each fragment gets sequenced in parallel Much more efficient than sequential sequencing
Next generation sequencing [NHGRI: http://www.genome.gov/sequencingcosts/ ]
What can we measure with NGS?
What can we measure with NGS? [Soon et al. 2013]
HighThroughput Technologies High-throughput technologies measure things in parallel Many based on Next Generation Sequencing We will focus on measuring transcription, BUT: NGS much more versatile. Experimental design, data processing and data analysis are increasingly demanding.
RNA sequencing (RNA-seq) Gene expression
Gene expression [Wikipedia] Transcription: pre-mrna RNA processing: mrna - 5 capping - 3 cleavage and polyadenylation - RNA splicing Translation: protein - Ribosome contains rrna. - rrna depletion for RNASeq - Poly(A) enrichment for RNASeq
Gene expression [Licatalosi & Darnell, NRG 2010] - Exons, Introns and UTRs - DNA -> pre-mrna -> mrna - Poly-A tails and sites
RNA sequencing RNA sequencing
RNA sequencing Sequence cdna to get information about RNA content of the sample. Rough idea: Total RNA of sample Poly(A) selection to target mrna (not always) rrna removal (not always) Reverse transcription (mrna to cdna) Fragmentation Sequencing
RNA sequencing steps: Quality control and then: Computational pipeline on the left - Many different methods, but steps essentially similar. - Alignment - Disambiguation / filtering - Abundance quantification [Mortazavi et al., Nature Methods 2008] RPKM: reads (mrna abundance) kilo base (transcript length) million reads mapped (library size) FPKM: single -> paired end reads -> fragments more reliable alignment TPM: Transcripts per million
RNA sequencing steps: Assign reads (fragments) to annotated exons
RNA sequencing steps: Ambiguous reads: Alternative: constitutively expressed exons [gene-level only].
RNA sequencing steps: Make new gene(models):
RNA sequencing steps: Calculate gene abundances (here: RPKM)
RNA sequencing (differential) RNA abundance and what RNA-seq can do
RNA expression / abundance RPKM FPKM TPM [Pachter 2013]
Differential gene expression 2.0 First things first: Differential gene expression: Using constitutively expressed exons. Far from trivial. Easier than -> [Trapnell et al., Nature Protocols 2012]
Comparison to microarray => RNA similar to microarray; better correlation for higher expression levels. [Guo et al. 2013]
Differential isoforms with RNA-seq RNAseq can distinguish isoforms. [Katz et al. 2010]
SNPs from gene expression data RNAseq can detect polymorphisms. [Zhao et al. 2014]
RNA sequencing Characterization of entire transcriptome Can be used for: Discovery (existence / expression of) genes, exons, and transcripts (alternative splicing) Change detection between conditions in Gene expression Isoform expression, and more Different types of RNAs (e.g., mirnas, ncrnas)