Lectures 13: High throughput sequencing: Beyond the genome. Spring 2017 March 28, 2017

Similar documents
genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RNA SEQUENCING AND DATA ANALYSIS

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

DNA-seq Bioinformatics Analysis: Copy Number Variation

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

RNA-seq Introduction

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Studying Alternative Splicing

Inference of Isoforms from Short Sequence Reads

Introduction to Systems Biology of Cancer Lecture 2

Assignment 5: Integrative epigenomics analysis

Transcriptome Analysis

Introduction. Introduction

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

RNA SEQUENCING AND DATA ANALYSIS

High-Throughput Sequencing Course

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

Transcript reconstruction

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

RESEARCHER S NAME: Làszlò Tora RESEARCHER S ORGANISATION: Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC)

RNA- seq Introduc1on. Promises and pi7alls

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

Trinity: Transcriptome Assembly for Genetic and Functional Analysis of Cancer [U24]

High Throughput Sequence (HTS) data analysis. Lei Zhou

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Investigating rare diseases with Agilent NGS solutions

Analyse de données de séquençage haut débit

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Effects of UBL5 knockdown on cell cycle distribution and sister chromatid cohesion

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

An epigenetic approach to understanding (and predicting?) environmental effects on gene expression

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Summary of Data Dissemination Working Group. 22Jan2016

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

GeneOverlap: An R package to test and visualize

Screening for novel oncology biomarker panels using both DNA and protein microarrays. John Anson, PhD VP Biomarker Discovery

The Cancer Genome Atlas & International Cancer Genome Consortium

Table S1. Total and mapped reads produced for each ChIP-seq sample

Gene and Genome Parameters of Mammalian Liver Circadian Genes (LCGs)

Transcriptome and isoform reconstruc1on with short reads. Tangled up in reads

NEXT GENERATION SEQUENCING. R. Piazza (MD, PhD) Dept. of Medicine and Surgery, University of Milano-Bicocca

Ambient temperature regulated flowering time

Mechanisms of alternative splicing regulation

Complexity DNA. Genome RNA. Transcriptome. Protein. Proteome. Metabolites. Metabolome

Simple, rapid, and reliable RNA sequencing

SUPPLEMENTARY INFORMATION

Chip Seq Peak Calling in Galaxy

DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data

Profili di espressione genica

ChIP-seq data analysis

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

Supplementary Figures

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Genetics/Genomics: role of genes in diagnosis and/risk and in personalised medicine

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Supplementary Figures

Methods: Biological Data

Metabolomic and Proteomics Solutions for Integrated Biology. Christine Miller Omics Market Manager ASMS 2015

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Part-II: Statistical analysis of ChIP-seq data

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

Multi-omics data integration colon cancer using proteogenomics approach

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Hands-On Ten The BRCA1 Gene and Protein

Morphogens: What are they and why should we care?

Supplementary methods:

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Phylogenomics. Antonis Rokas Department of Biological Sciences Vanderbilt University.

EXPression ANalyzer and DisplayER

TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer.

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Nature Biotechnology: doi: /nbt.1904

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Breast and ovarian cancer in Serbia: the importance of mutation detection in hereditary predisposition genes using NGS

ESCs were lysed with Trizol reagent (Life technologies) and RNA was extracted according to

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

RNA- seq Introduc1on. Promises and pi7alls

Cancer Informatics Lecture

OMICS Journals are welcoming Submissions

Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition

ABS04. ~ Inaugural Applied Bayesian Statistics School EXPRESSION

ncounter Assay Automated Process Immobilize and align reporter for image collecting and barcode counting ncounter Prep Station

Data mining with Ensembl Biomart. Stéphanie Le Gras

10/19/2017. How Nutritional Genomics Affects You in Nutrition Research and Practice Joyanna Hansen, PhD, RD & Kristin Guertin, PhD, MPH

BIOMEDICAL SCIENCES GRADUATE PROGRAM SPRING 2015

Global Epigenetic and Transcriptional Trends among Two Rice Subspecies and Their Reciprocal Hybrids W

Transcription:

Lectures 13: High throughput sequencing: Beyond the genome Spring 2017 March 28, 2017

h@p://www.fejes.ca/2009/06/science- cartoons- 5- rna- seq.html

Omics Transcriptome - the set of all mrnas present in a cell Proteome proteins Metabolome/physiome - metabolites Microbiome the collecson of microbes present in an organism or other locason Interactome In physics the - on suffix has tended to signify an elementary parscle: the photon, electron, proton, meson, etc., whereas - ome in biology has the opposite intellectual funcson, of direcsng a@enson to a holissc abstracson, an eventual goal From: Ome Sweet Omics. The ScienSst 15(7), 2001

Omics Biologists have high- throughput methods for probing each - ome: Transcriptome RNA- Seq Proteome mass spectrometry, protein arrays Microbiome next generason sequencing Interactome yeast- two- hybrid Regulome ChIP- Seq Lots of data for bioinformascs people to analyze!

RNA- seq: profiling the transcriptome Technique: sequence the total RNA produced by the cell

Read mapping

Pile- ups From: The ENCODE Project ConsorSum (2011) A User's Guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biol 9(4): e1001046.

Pile- ups gene model read depth Most reads fall into coding exons or UTRs

RNA- seq: profiling the transcriptome Technique: sequence the total RNA produced by the cell What is this good for?

RNA- seq: profiling the transcriptome Genome annotason (transcript assembly) Detect alternasve splicing Obtain gene/transcript expression levels and detecson of differensal expression Allele- specific expression Small- RNA transcriptome (different protocol than regular RNA- seq)

All the uses of RNA- seq h@p://www.rna- seqblog.com/news/informason/rna- seq- blog- poll- results/

DifferenSal expression h@p://www.fejes.ca/labels/figures.html

RNA- seq protocol

Raw and Aligned Reads Raw data is a (large) set of sequences Typical file format is FASTQ @HWI-EAS255_4_FC2010Y_1_43_110_790 TTAATCTACAGAATAGATAGCTAGCATATATTT + hhhhhhhhhhhhhhhdhhhhhhhhhhhdrehdh Read idensfier Bases called Base quality codes Alignment to genome is done by efficient indexing Aligned reads in SAM format @HWI- 163 chr19 9900 10000 16M2I25M Read idensfier Where this read matched Start and end posisons Codes for match: 16 matches, 2 extra,

Cataloging the transcriptome Transcriptomics involves studying expression at SpaSal resoluson: Sssues, individuals, locason Temporal resoluson: circadian, seasonal, lifesme

Inter- Genic Reads Many reads reflect unannotated genes: opportunity to discover new genes

RPKM A Simple NormalizaSon Different numbers of counts per sample (sequencing depth) Divide counts in a region of interest (a genomic region or a gene or an exon) by all counts (reads per million reads - RPM) Genes have different lengths: divide also by length of gene Obtain RPKM (reads per kilobase of exon per million reads) Some use FPKM (fragments/kb/mr)

ChIP- seq h@p://www.fejes.ca/labels/chip- Seq.html

Comments on ChIP- seq Genome- wide mapping of transcripson factor binding sites ComputaSonal problems: Peak calling SSll need mosf finders, but makes the problem easier

Variants Apply the methodology to RNA: map RNA- binding sites in mrna that interact with specific RNA- binding proteins CLIP- Seq (cross- linking immunoprecipitason sequencing) RIP- Seq (RNA immunoprecipitason sequencing)

Other sequencing- based techniques Methyl- seq, BS- seq: methylason Chromosome conformason capture (3C- 4C- 5C- HiC): spasal organizason of chromosomes h@p://en.wikipedia.org/wiki/chromosome_conformason_capture

Other sequencing- based techniques Methyl- seq, BS- seq: methylason Chromosome conformason capture (3C- 4C- 5C): spasal organizason of chromosomes seqfold: RNA secondary structure DNAase- seq And many more! h@p://en.wikipedia.org/wiki/chromosome_conformason_capture

Read mapping All the sequencing- based techniques require read mapping as a first step. ExisSng alignment tools are not fast enough à need new algorithms!

Read mapping How is the problem of read mapping different than sequence alignment as we have considered it unsl now?