RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

Similar documents
Simple, rapid, and reliable RNA sequencing

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

Supplementary. properties of. network types. randomly sampled. subsets (75%

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Small RNAs and how to analyze them using sequencing

Transcriptome Analysis

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)

Cellecta Overview. Started Operations in 2007 Headquarters: Mountain View, CA

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

EXOTESTTM. ELISA assay for exosome capture, quantification and characterization from cell culture supernatants and biological fluids

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

Small RNA Sequencing. Project Workflow. Service Description. Sequencing Service Specification BGISEQ-500 SERVICE OVERVIEW SAMPLE PREPARATION

INTEGRATION OF GENERAL AMINO ACID CONTROL AND TOR REGULATORY PATHWAYS IN NITROGEN ASSIMILATION IN YEAST

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

CONTRACTING ORGANIZATION: Johns Hopkins University, Baltimore, MD

EpiQuik Circulating Acetyl Histone H3K18 ELISA Kit (Colorimetric)

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

High Throughput TruSeq Stranded mrna Library Construction on the Biomek FX P

Molecular Profiling of Tumor Microenvironment Alex Chenchik, Ph.D. Cellecta, Inc.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Supplementary Figures

Supplement to SCnorm: robust normalization of single-cell RNA-seq data

Nature Structural & Molecular Biology: doi: /nsmb.2419

Supplemental Information. Regulatory T Cells Exhibit Distinct. Features in Human Breast Cancer

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Supplemental Methods RNA sequencing experiment

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Supplementary information

York criteria, 6 RA patients and 10 age- and gender-matched healthy controls (HCs).

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Supplementary Figure 1 a

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis

EPIGENTEK. EpiQuik Global Acetyl Histone H3K27 Quantification Kit (Colorimetric) Base Catalog # P-4059 PLEASE READ THIS ENTIRE USER GUIDE BEFORE USE

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Online Appendix Material and Methods: Pancreatic RNA isolation and quantitative real-time (q)rt-pcr. Mice were fasted overnight and killed 1 hour (h)

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

RNA SEQUENCING AND DATA ANALYSIS

An RNA-Seq Strategy to Detect the Complete Coding and Non-Coding Transcriptome Including Full-Length Imprinted Macro ncrnas

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

SUPPLEMENTARY INFORMATION

SSM signature genes are highly expressed in residual scar tissues after preoperative radiotherapy of rectal cancer.

RNA-Seq profiling of circular RNAs in human colorectal Cancer liver metastasis and the potential biomarkers

Simple Linear Regression the model, estimation and testing

(DNA) Real-time PCR. Exicycler 96 Rotor-Gene Q/6000 PCR

Nature Neuroscience: doi: /nn Supplementary Figure 1

(A) Cells grown in monolayer were fixed and stained for surfactant protein-c (SPC,

mirna-guided regulation at the molecular level

EpiQuik Total Histone H3 Acetylation Detection Fast Kit (Colorimetric)

Figure S1A. Blood glucose levels in mice after glucose injection

A Statistical Framework for Classification of Tumor Type from microrna Data

Synthetic microrna Reference Standards Genomics Research Group ABRF 2015

RNA-seq. Differential analysis

Hao D. H., Ma W. G., Sheng Y. L., Zhang J. B., Jin Y. F., Yang H. Q., Li Z. G., Wang S. S., GONG Ming*

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded

Gene-microRNA network module analysis for ovarian cancer

DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data

Eukaryotic small RNA Small RNAseq data analysis for mirna identification

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Polymer Technology Systems, Inc. CardioChek PA Comparison Study

Ayako Suzuki 1, Koutatsu Matsushima 2, Hideki Makinoshima 2, Sumio Sugano 1, Takashi Kohno 3,4, Katsuya Tsuchihara 2 and Yutaka Suzuki 1,5*

Supplementary Figures and Tables

Supplementary Figure 1

Supplementary Figure 1

EPIGENOMICS PROFILING SERVICES

Method Comparison Report Semi-Annual 1/5/2018

Nature Immunology: doi: /ni Supplementary Figure 1

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Transcript reconstruction

Blood Urea Nitrogen Enzymatic Kit Manual Catalog #:

Development and Validation of a Polysorbate 20 Assay in a Therapeutic Antibody Formulation by RP-HPLC and Charged Aerosol Detector (CAD)

Nature Getetics: doi: /ng.3471

User Guide. Association analysis. Input

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

Arabidopsis thaliana small RNA Sequencing. Report

Supplementary Figure 1. Using DNA barcode-labeled MHC multimers to generate TCR fingerprints

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Exercises: Differential Methylation

Blood Urea Nitrogen Enzymatic Kit Manual Catalog #:

Rice in vivo RNA structurome reveals RNA secondary structure conservation and divergence in plants

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Heritability enrichment of differentially expressed genes. Hilary Finucane PGC Statistical Analysis Call January 26, 2016

(DNA) Real-time PCR. Exicycler 96 Rotor-Gene Q/6000 PCR

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

SUPPLEMENTARY APPENDIX

Supporting Information

Effects of UBL5 knockdown on cell cycle distribution and sister chromatid cohesion

Final Exam Version A

Small RNAs and how to analyze them using sequencing

Transcription:

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB CSF-NGS January 22, 214 Contents 1 Introduction 1 2 Experimental Details 1 3 Results And Discussion 1 3.1 ERCC spike ins............................................ 1 3.2 RNA alignment............................................ 7 3.3 Differential Expression........................................ 11 1 Introduction It was necessary to establish a quick but reliable protocol for the preparation of Illumina sequencing libraries with total RNA as starting material to offer as a service for the CSF-NGS users. The current in house protocol ( standard ) was deemed as taking too long, because hands on time is the major cost factor. Many different protocols and kits for mrna preparation are currently available. (citations). 2 Experimental Details Liver (Clontech cat. nr 63663) and Kidney (Clontech cat. nr 636612) mouse total RNA was used as source material. ERCC spike ins 1 or 2 (life-technologiesi catalog number 445674) were added to each source tube and the combined sample was split to triplicates and each triplicate was prepared separately with either the in-house standard protocol (), the ogen sense kit (ogen catalog number 1.8) or the NEB kit (NEBNext Ultra Directional RNA Library Prep Kit for Illumina, catalog number E742). The resulting samples were sequenced on a HiSeq2 SE with a read length of 5. The reads were 5 trimmed for the ogen preparation method, adaptors were removed with cutadapt, the rrna reads and ERCC spike ins were removed by alignment with bowtie, and the remaining reads were aligned to the mouse mm1 genome and transcriptome using tophat 1.4.1. The aligned tophat reads were counted per gene with HTSeq-count and the counts were used for differential gene expression estimation using DESeq. 3 Results And Discussion 3.1 ERCC spike ins Splitting the sample with the spike in mix already added into the three technical replicates per condition allowed us to assess the reliability of the preparation. By normalizing against the total number of obtained reads we estimated the coefficient of variation per condition and preparation to between.1 ( kidney) and.18 ( liver) (Table??). Normalizing agains the ERCC aligned reads allows to calculate a dose-response regression line (Table??. All experiments show a high correlation with the NEB prepared spike-ins showing the lowest variance. TODO: calculate range of detection. 1

Table 1: summary of results Criteria Lexogen NEB Standard hands on time a 5-6h 1d 3d detection of differential gene expression very good very good very good variance per gene low very low low strandiness very high high high variance of gene coverage spiky b very low very low 5 coverage lag low detectable detectable rrna depletion c exhaustive low low ease of automatication not known protocols available d low duplication medium b very low low dynamic range good good good a including RNA QC, excluding library QC b expected due to priming method, no influence on differential expression c only one round of poly-a enrichment d for Hamilton STAR robot 2

prep condition mean sd cv liver.44.8.18 kidney.49.3.7 liver.57.4.6 kidney 1.31.8.6 liver.68.4.6 kidney.77.1.1 1.2 percent ERCC/total.9 replicate 1 2 3 condition liver kidney.6 prep Figure 1: Relative abundance of ERCC-Spike Ins compared to total number of reads. Reads were aligned with bowtie against the ERCC genes and the mouse rdna cluster and the uniquely aligning reads were counted. 3

prep condition intercept slope R2 liver 1.3.861.864 kidney.86.877.888 liver.967.95.899 kidney 1.1.892.893 liver.67.931.96 kidney.384.948.966 2 15 log2(rpkm) 1 5 2 15 1 5 liver kidney legend expected lm loess ND 5 5 1 15 5 5 1 15 5 5 1 15 log2(expected attomoles) Figure 2: ERCC dose response. The sum of all the uniquely aligning reads per ERCC gene was normalized by the length of the gene and the total number of reads aligning uniquely to the ERCC controls and the resulting rpkm values were plotted against the expected number of RNA-molecules. Linear regression parameters (top) and scatter plot (bottom) with expected counts (blue line), regression line (red line), loess curve (green line) and undetected genes at an arbitrary rpkm value (yellow dots). 4

y =.13 +.9 x r 2 =.691 y =.56 + 1 x r 2 =.77 y =.23 +.97 x r 2 =.81 2 log2(l/k) model expected lm 2 1 1 2 1 1 2 1 1 2 expected log2(l/k) Figure 3: ERCC fold-change response. The ratios of the mean rpm log2 ratio per condition were plotted against the expected log2 ratio. Expected counts (red line) and regression line (green line) are indicated. 5

3 average coverage per million reads 2 1 3 2 1 6 4 2 15 1 5 (,5] (5,1] (1,2] (2,92] prep 25 5 75 1 position % Figure 4: ERCC coverage across genes. The genes were binned per preparation method by their rank (top 5, 6-1, 11-2, rest) and the average coverage per million reads per bin is plotted against the length normalized genes. 6

3.2 RNA alignment 7

Alignment Distribution liver kidney liver kidney liver kidney absolute counts 3e+7 2e+7 1e+7 V1 Cleaned Cut NM R U U1 e+ U2 15956 15957 15958 15959 1596 15961 15969 1597 15971 15972 15973 15974 sample id 16153 16154 16155 16156 16157 16158 percent of total 6 4 2 replicate 1 2 3 condition liver kidney Cleaned Cut NM R U U1 U2 Cleaned Cut NM R U U1 U2 align type Cleaned Cut NM R U U1 U2 Figure 5: Alignment Statistics. The reads were 5 trimmed for the ogen preparation method, adaptors were removed with cutadapt, the rrna reads and ERCC spike ins were removed by alignment with bowtie, and the remaining reads were aligned to the mouse mm1 genome and transcriptome using tophat 1.4.1. Absolute counts (top panel) and relative percentage (bottom panel) of each alignment category (Cut: small adaptor truncated reads removed, Cleaned: reads aligning to ERCC or rrna, U-U3: unique alignments with -3 mismatches, R: reads aligning repetitively, NM: reads not aligning) are shown. 8

1 cumulative percent of uniquely aligned reads 75 5 25 preparation replicate 1 2 3 1 1 X plicates Figure 6: Xplicates. Uniquely aligned reads were binned by number of overlaps at each position and the cumulative sum was calculated with increasing number of duplication. 9

1.5 unstranded same opposite 1. normalized mean coverage.5. 1.5 1..5. 1.5 1..5 condition kidney liver replicate 1 2 3. 25 5 75 1 25 5 75 1 25 5 75 1 bin Figure 7: Coverage across cdna. Mean coverage across all length normalized genes (cdna). 1

3.3 Differential Expression 11

1 5 1 5 5 1 5 1 1 5 1 5.95 5 1 1 5 5 5.93.95 5 1 5 1 Scatter Plot Matrix Figure 8: Scatter plot matrix of log2 fold changes per preparation. The log2 fold changes of the comparison liver/kidney of each preparation were plotted against each other (upper triangle) and the spearman rank correlation was calculated (lower triangle). 12

13 dispersion 1e 4 1e 2 1e+ 1e+1 1e+5 1e+1 1e+3 l 1e+5 1e+1 1e+3 k 1e+5 mean 1e+1 1e+3 l 1e+5 1e+1 1e+3 k 1e+5 1e+1 Figure 9: The variance of each condition was estimated by deseq estimatedispersions with the model fit indicated in red. 1e+3 k 1e+3 l 1e+5

adj.p <.1 ; abs(log2fc) > 1 adj.p <.1 ; abs(log2fc) > 5 625 617 6434 adj.p <.1 ; abs(log2fc) > 5 118 119 1258 adj.p <.1 ; abs(log2fc) > 1 1127 1155 1215 169 194 392 Figure 1: Venn Diagrams of significantly differentially expressed genes under different cutoffs. 14