RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Similar documents
genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Ambient temperature regulated flowering time

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

Inference of Isoforms from Short Sequence Reads

Introduction. Introduction

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR)

Effects of UBL5 knockdown on cell cycle distribution and sister chromatid cohesion

A Quick-Start Guide for rseqdiff

High Throughput Sequence (HTS) data analysis. Lei Zhou

Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy

Transcriptome Analysis

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Supplementary Figures and Tables

Supplementary Figures

Simple, rapid, and reliable RNA sequencing

Supplemental Methods RNA sequencing experiment

RNA SEQUENCING AND DATA ANALYSIS

SUPPLEMENTARY INFORMATION. Intron retention is a widespread mechanism of tumor suppressor inactivation.

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

RNA SEQUENCING AND DATA ANALYSIS

High-Throughput Sequencing Course

Methods: Biological Data

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Mechanisms of alternative splicing regulation

Supplement to SCnorm: robust normalization of single-cell RNA-seq data

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities

Analyse de données de séquençage haut débit

Lectures 13: High throughput sequencing: Beyond the genome. Spring 2017 March 28, 2017

RNA-seq Introduction

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data

MODULE 3: TRANSCRIPTION PART II

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, MD

Analysis and design of RNA sequencing experiments for identifying isoform regulation

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Nature Genetics: doi: /ng.3731

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Predominant contribution of cis-regulatory divergence in the evolution of mouse alternative splicing

TITLE: The Role Of Alternative Splicing In Breast Cancer Progression

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Computational Modeling and Inference of Alternative Splicing Regulation.

Integrated Proteomic and Genomic Analysis of Gastric Cancer

iclip Predicts the Dual Splicing Effects of TIA-RNA Interactions

Supplementary information. Supplementary figure 1. Flow chart of study design

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Nature Biotechnology: doi: /nbt.1904

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Daehwan Kim September 2018

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Supplementary Figure 1. Spitzoid Melanoma with PPFIBP1-MET fusion. (a) Histopathology (4x) shows a domed papule with melanocytes extending into the

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

Cancer outlier differential gene expression detection

GeneOverlap: An R package to test and visualize

Nature Methods: doi: /nmeth.3115

Introduction to Systems Biology of Cancer Lecture 2

Supplementary Table S1. List of PTPRK-RSPO3 gene fusions in TCGA's colon cancer cohort. Chr. # of Gene 2. Chr. # of Gene 1

Supplemental Materials. for. Conservation of an RNA Regulatory Map between Drosophila and. Mammals

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Soft Agar Assay. For each cell pool, 100,000 cells were resuspended in 0.35% (w/v)

Package diggitdata. April 11, 2019

Marta Puerto Plasencia. microrna sponges

SpliceDB: database of canonical and non-canonical mammalian splice sites

Profiles of gene expression & diagnosis/prognosis of cancer. MCs in Advanced Genetics Ainoa Planas Riverola

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Supplementary Methods

ChIP-seq data analysis

Supplementary Materials for

SUPPLEMENTARY FIGURES: Supplementary Figure 1

Widespread alternative and aberrant splicing revealed by lariat sequencing

Gene Ontology 2 Function/Pathway Enrichment. Biol4559 Thurs, April 12, 2018 Bill Pearson Pinn 6-057

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

iplex genotyping IDH1 and IDH2 assays utilized the following primer sets (forward and reverse primers along with extension primers).

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Chapter 5: Global Analysis of the Effect of Densin. Knockout on Gene Transcription in the Brain

A Statistical Framework for Classification of Tumor Type from microrna Data

SUPPLEMENTARY INFORMATION

Below, we included the point-to-point response to the comments of both reviewers.

RECENT ADVANCES IN THE MOLECULAR DIAGNOSIS OF BREAST CANCER

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Long non-coding RNAs

omiras: MicroRNA regulation of gene expression

RECAP (1)! In eukaryotes, large primary transcripts are processed to smaller, mature mrnas.! What was first evidence for this precursorproduct

GENOME-WIDE DETECTION OF ALTERNATIVE SPLICING IN EXPRESSED SEQUENCES USING PARTIAL ORDER MULTIPLE SEQUENCE ALIGNMENT GRAPHS

Supplemental Information For: The genetics of splicing in neuroblastoma

RNA preparation from extracted paraffin cores:

Pseudogenes transcribed in breast invasive carcinoma show subtype-specific expression and cerna potential

Transcription:

Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering, Korea University, Seoul 136-701, Korea. 2 Stanford Genome Technology Center, Palo Alto, CA 94304, USA 3 Massachusetts General Hospital and Shriners Hospital for Children, Boston, MA 02114, USA. * Corresponding Author Emails JS: jseok14@korea.ac.kr WXiao: wenzhong.xiao@mgh.harvard.edu

Supplementary Methods Verification of HTA results by mrna-seq data: High-throughput RNA sequencing data of human liver and muscle tissues was obtained from GSE12946 1. Sequencing reads were mapped over the exonic regions, allowing up to two nucleotide mismatches by SeqMap 2. An exon expression index was calculated as the number of reads per kilobase per million reads (RPKM) using only uniquely mapped reads 3. A gene expression index was computed in a similar way using reads aggregated from all exons that belong to the gene. To verify HTA analyses, one robust set of alternatively spliced exons and another robust set of non-spliced exons were determined from the mrna-seq data, using p-values from statistical tests for relative over-expression of exons (p), the number of mapped reads (n), and fold changes of exon expression normalized to gene expression (f). The over-expression of exons relative to gene expression was tested by Fisher s exact tests 4. For a simple comparison between two conditions (C1 and C2), an exon is determined to be strictly over-expressed in C1 if p < 0.001, n 20 and f > 2, or strictly not over-expressed in C1 if p > 0.5, n in the C2 20, and the exon has the larger normalized expression in C2 than in C1. For both conditions, four sets (strictly overexpressed in C1, strictly not over-expressed in C1, strictly over-expressed in C2, and strictly not over-expressed in C2) were determined, and exons that were not in any of four sets were classified to be not determinable. If an exon detected to be over-expressed in one condition by a microarray analysis is also determined to be strictly over-expressed in the same condition in mrna-seq data, it is counted as a true positive (TP). If it is determined to be strictly not overexpressed in the same condition, it is counted as a false positive (FP). The verification rate is defined as (# of TPs)/(# of TPs + # of FPs). Several mrna-seq tools for alternative splicing analysis, such as Cufflinks 5, Casper 6, and DiffSplice 7, are based on the estimation of isoform-level expression. An objective for RASA is to identify specific exons as alternative splicing candidates that can be verified using RT-PCR. Since it is not easy to confidently extract exon-level alternative splicing events from the isoformlevel differential expression, we employed a conservative method to check the alternative

splicing of each exon based on the number of mapped reads, expression fold changes, and statistical tests using Fisher s exact tests, which have been used in many previous works 1,4,8,9. ASPIRE and ASI method: The proposed method was compared with the analysis of splicing by isoform reciprocity (ASPIRE) algorithm 10,11, and the accumulated splicing index (ASI) 12. For the alternative splicing score of an exon, ASPIRE calculates an inclusion ratio using both inclusion and exclusion junctions of the exon. ASI calculates the sum of splicing indices of an exon and all of its junctions as the score. Both ASPIRE and ASI detect top several exons ranked by their scores as confident candidates of alternative splicing. While ASI was tested over all exons, ASPIRE was tested over a subset of exons whose exclusion junctions are available because ASPIRE requires exclusion junctions in its calculation. RT-PCR validation: Primer pairs were designed using Primer3 package, and manually reviewed to ensure differentiation between PCR products of different transcript isoforms. RT-PCR experiments were performed following a standard protocol on the 7900HT Fast Real-Time PCR System (Life Technology, Inc). Cycle numbers (Ct) to reach a pre-determined threshold in exponential increment phase were recorded. For a candidate exon, RT-PCR experiments were repeated three times, and the average Ct was used to estimate the expression of the candidate exon. Gene expression was estimated from the expression of adjacent constitutive exons of the candidate alternative exon. To verify alternative splicing, the odd ratio of splicing index was computed as the ratio between the fold change of exon expression and the fold change of gene expression. Additional test data of human T-cell and monocyte samples: RASA was tested using human T- cell and monocyte samples. 10 biological replicates of each cell type were hybridized on a custom-designed HTA 13. These samples were obtained from 10 healthy individuals as a part of the Glue Grant study, Inflammation and the Host Response to Injury (www.gluegrant.org). By sequencing the pool of the 10 replicates for each cell type using an Illumina Hi-Seq machine, 90 million and 98 million reads were obtained for the T-cell and monocyte samples, respectively. The mrna-seq data were processed by Cufflink 5 with the default option to consider ambiguous read mapping, and the expression indices of genes and exons were calculated accordingly. The

detected splicing events by RASA from the HTA data were compare with the mrna-seq data from the same samples. Supplementary References 1 Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470-476 (2008). 2 Jiang, H. & Wong, W. H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics 24, 2395-2396 (2008). 3 Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5, 621-628 (2008). 4 Bullard, J. H., Purdom, E., Hansen, K. D. & Dudoit, S. Evaluation of statistical methods for normalization and differential expression in mrna-seq experiments. BMC Bioinformatics 11, 94 (2010). 5 Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511-515 (2010). 6 Rossell, D., Stephan-Otto Attolini, C., Kroiss, M. & Stocker, A. Quantifying Alternative Splicing from Paired-End Rna-Sequencing Data. The annals of applied statistics 8, 309-330 (2014). 7 Hu, Y. et al. DiffSplice: the genome-wide detection of differential splicing events with RNA-seq. Nucleic Acids Res 41, e39 (2013). 8 Brooks, A. N. et al. Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res 21, 193-202 (2011). 9 Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7, 1009-1015 (2010). 10 Licatalosi, D. D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464-469 (2008). 11 Ule, J. et al. CLIP identifies Nova-regulated RNA networks in the brain. Science 302, 1212-1215 (2003).

12 Clark, T. A., Sugnet, C. W. & Ares, M., Jr. Genomewide analysis of mrna processing in yeast using splicing-specific microarrays. Science 296, 907-910 (2002). 13 Xu, W. et al. Human transcriptome array for high-throughput clinical studies. Proc Natl Acad Sci U S A 108, 3707-3712 (2011).

Figure S1. Diagram of the overall procedure of RASA. Orange boxes are preprocessing steps of the microarray data. Blue boxes are major outputs of the program. Purple boxes are the major steps of the proposed algorithm for alternative splicing analysis.

A B C Figure S2. Performance benchmarks on the custom-designed HTA data. Verification rates are also shown with microarray detection parameters of (A) MIDAS p-values, (B) MADS p- values, and (C) exon expression fold changes relative to gene expression fold changes. The methods calculating gene expression with all exons, annotated constitutive exons, and putatively constitutive exons are noted with black, red and green lines respectively. The detection results with and without junction supports are represented with bold and dashed lines respectively. The RASA results are presented with green and solid lines.

Figure S3. Performance changes over a range of parameters. The performance of RASA over the changes of (A) sensitivity parameter d and (B) quantile threshold q. The bold black lines represent the detection accuracy with the default parameters (d=2 and q=0.05). The dashed lines represent the boundary of 10% changes from the accuracy with default parameters.

A B Figure S4. Performance comparison with all-junction approaches on the custome-designed HTA data. Performance comparison of (A) ASPIRE and (B) ASI with the proposed method using the custom-designed HTA data. Red lines represent validation rates of the proposed method of a best-junction approach, and black lines represent the other methods of all-junction approaches. The x-axis shows fold change criterions used for the proposed method.

Figure S5. Performance comparison with MADS+. Comparison between the approach of RASA that requires strong signals from exons (red), and the approach of MADS+ that do not requiring strong exon signals. The detections by RASA and MADS+ were performed over the same set of exons.

Figure S6. Performance comparison with T-cell and monocyte samples. Comparisons of the performance of RASA (red) and conventional method (black) to detect alternatively spliced exons between human T-cells and monocytes. Verification rates are shown along with (A) MIDAS p-values and (B) MADS p-values. RASA used putative constitutive exon selection for gene expression calculation and both exon and junction signals for the detection of alternative splicing as proposed. The conventional method used all exons for gene expression calculation and only exon signals for the detection of alternative splicing.

Figure S7. Alternative Splicing in NIN gene. An exon (chr14: 50,292,960-50,295,098) of Ninein (NIN) gene is shown as a confident candidate of alternative splicing between human T- cells and monocytes (red arrow in the gene annotation; red dots in the plots). The plots show the expression of exons of NIN from the HTA and mrna-seq data (black circles). Red lines represent the averages of fold changes of exons.