GeneOverlap: An R package to test and visualize

Similar documents
metaseq: Meta-analysis of RNA-seq count data

MethylMix An R package for identifying DNA methylation driven genes

splicer: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data

Introduction to antiprofiles

AIMS: Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype

Using Messina. Mark Pinese. October 13, Introduction The problem Example: Designing a colon cancer screening test...

Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Supervised analysis of MS images using Cardinal

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Using the DART package: Denoising Algorithm based on Relevance network Topology

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

User Guide. Association analysis. Input

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Chip Seq Peak Calling in Galaxy

Daehwan Kim September 2018

Nature Methods: doi: /nmeth.3115

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

September 20, 2017 SIMULATED DATA

Table S1. Total and mapped reads produced for each ChIP-seq sample

Supplementary Figures

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

How To Use MiRSEA. Junwei Han. July 1, Overview 1. 2 Get the pathway-mirna correlation profile(pmset) and a weighting matrix 2

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

How To Use SubpathwayGMir

Infer mirna-mrna interactions using paired expression data from a single sample

Functional annotation of farm animal genomes: ChIP-seq.

Supplemental Methods RNA sequencing experiment

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

The Epigenome Tools 2: ChIP-Seq and Data Analysis

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

ChIP-seq data analysis

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

ESCs were lysed with Trizol reagent (Life technologies) and RNA was extracted according to

Assignment 5: Integrative epigenomics analysis

Methods: Biological Data

Introduction to Systems Biology of Cancer Lecture 2

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Package splicer. R topics documented: August 19, Version Date 2014/04/29

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Peak-calling for ChIP-seq and ATAC-seq

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

High Throughput Sequence (HTS) data analysis. Lei Zhou

Quality assessment of TCGA Agilent gene expression data for ovary cancer

RNA-seq Introduction

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

A fatty liver study on Mus musculus

EPIGENOMICS PROFILING SERVICES

Inference of Isoforms from Short Sequence Reads

Allelic reprogramming of the histone modification H3K4me3 in early mammalian development

Matching the Cisplatin Heatmap

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

SCIENCE CHINA Life Sciences

Lectures 13: High throughput sequencing: Beyond the genome. Spring 2017 March 28, 2017

Transcriptome Analysis

Gene Expression Analysis Web Forum. Jonathan Gerstenhaber Field Application Specialist

A Quick-Start Guide for rseqdiff

Integrated analysis of sequencing data

Analyse de données de séquençage haut débit

Yue Wei 1, Rui Chen 2, Carlos E. Bueso-Ramos 3, Hui Yang 1, and Guillermo Garcia-Manero 1

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Ambient temperature regulated flowering time

epigenomix Epigenetic and gene transcription data normalization and integration with mixture models

Obstacles and challenges in the analysis of microrna sequencing data

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Nature Structural & Molecular Biology: doi: /nsmb.2419

Mirsynergy: detect synergistic mirna regulatory modules by overlapping neighbourhood expansion

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

RNA SEQUENCING AND DATA ANALYSIS

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Module 3: Pathway and Drug Development

Package TFEA.ChIP. R topics documented: January 28, 2018

Transcript reconstruction

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

Gene Ontology 2 Function/Pathway Enrichment. Biol4559 Thurs, April 12, 2018 Bill Pearson Pinn 6-057

A fatty liver study on Mus musculus

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

Supplementary Figure 1 IL-27 IL

cis-regulatory enrichment analysis in human, mouse and fly

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Applying Metab. Raphael Aggio. October 30, This document describes how to use the function included in the R package Metab.

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Transcription:

GeneOverlap: An R package to test and visualize gene overlaps Li Shen Contact: li.shen@mssm.edu or shenli.sam@gmail.com Icahn School of Medicine at Mount Sinai New York, New York http://shenlab-sinai.github.io/shenlab-sinai/ October 30, 2017 Contents 1 Data preparation........................... 1 2 Testing overlap between two gene lists.............. 2 3 Visualizing all pairwise overlaps.................. 5 4 Data source and processing.................... 10 5 SessionInfo.............................. 11 1 Data preparation Overlapping gene lists can reveal biological meanings and may lead to novel hypotheses. For example, histone modification is an important cellular mechanism that can pack and re-pack chromatin. By making the chromatin structure more dense or loose, the gene expression can be turned on or off. Tri-methylation on lysine 4 of histone H3 (H3K4me3) is associated with gene activation and its genome-wide enrichment can be mapped by using ChIP-seq experiments. Because of its activating role, if we overlap the genes that are bound by H3K4me3 with the genes that are highly expressed, we should expect a positive association. Similary, we can perform such kind of overlapping between the gene lists of different histone modifications with that of various expression groups and establish each histone modification s role in gene regulation. Mathematically, the problem can be stated as: given a whole set I of IDs and two sets A I and B I, and S = A B, what is the significance of seeing S? This problem can be formulated as a hypergeometric distribution or a contigency table (which can be solved by Fisher s exact test; see GeneOverlap documentation). The task is so commonly seen across a biological study that it is worth being made into routines. Therefore, GeneOverlap is created to manipulate the data, perform the test and visualize the result of overlaps.

The GeneOverlap package is composed of two classes: GeneOverlap and GeneOverlapMatrix. The GeneOverlap class performs the testing and serves as building blocks to the GeneOverlapMatrix class. First, let s load the package > library(geneoverlap) To use the GeneOverlap class, create two character vectors that represent gene names. An easy way to do this is to use read.table("filename.txt") to read them from text files. As a convenience, a couple of gene lists have been compiled into the GeneOverlap data. Use > data(geneoverlap) to load them. Now, let s see what are contained in the data: there are three objects. The hesc.chipseq.list and hesc.rnaseq.list objects contain gene lists from ChIP-seq and RNA-seq experiments. The gs.rnaseq variable contains the number of genes in the genomic background. Refer to Section 4 for details about how they were created. Let s see how many genes are there in the gene lists. > sapply(hesc.chipseq.list, length) H3K4me3 H3K36me3 13448 297 3853 4533 > sapply(hesc.rnaseq.list, length) Exp High Exp Medium > gs.rnaseq [1] 21196 Exp Low 6540 6011 8684 In GeneOverlap, we refer to a collection of gene lists as a gene set that is represented as a named list. Here we can see that the ChIP-seq gene set contains four gene lists of different histone marks: H3K4me3,, and H3K36me3; the RNA-seq gene set contains three gene lists of different expression levels: High, Medium and Low. Two histone marks are associated with gene activation: H3K4me3 and H3K36me3 while the other two are associated with gene repression: and. 2 Testing overlap between two gene lists We want to explore whether the activation mark - H3K4me3 is associated with genes that are highly expressed. First, let s construct a GeneOverlap object > go.obj <- newgeneoverlap(hesc.chipseq.list$h3k4me3, + hesc.rnaseq.list$"exp High", + genome.size=gs.rnaseq) > go.obj GeneOverlap object: lista size=13448 listb size=6540 2

Intersection size=5916 Overlap testing has not been performed yet. As we can see, the GeneOverlap constructor has already done some basic statistics for us, such as the number of intersections. To test the statistical significance of association, we do > go.obj <- testgeneoverlap(go.obj) > go.obj GeneOverlap object: lista size=13448 listb size=6540 Intersection size=5916 Overlapping p-value=0e+00 Jaccard Index=0.4 The GeneOverlap class formulates the problem as testing whether two variables are independent, which can be represented as a contingency table, and then uses Fisher s exact test to find the statistical significance. Here the P-value is zero, which means the overlap is highly significant. The Fisher s exact test also gives an odds ratio which represents the strength of association. If an odds ratio is equal to or less than 1, there is no association between the two lists. If the odds ratio is much larger than 1, then the association is strong. The class also calculates the Jaccard index which measures the similarity between two lists. The Jaccard index varies between 0 and 1, with 0 meaning there is no similarity between the two and 1 meaning the two are identical. To show some more details, use the print function > print(go.obj) Detailed information about this GeneOverlap object: lista size=13448, e.g. ENSG00000187634 ENSG00000188976 ENSG00000187961 listb size=6540, e.g. ENSG00000000003 ENSG00000000419 ENSG00000000460 Intersection size=5916, e.g. ENSG00000188976 ENSG00000187961 ENSG00000187608 Union size=14072, e.g. ENSG00000187634 ENSG00000188976 ENSG00000187961 Genome size=21196 # Contingency Table: nota ina notb 7124 7532 inb 624 5916 Overlapping p-value=0e+00 Odds ratio=9.0 Overlap tested using Fisher's exact test (alternative=greater) Jaccard Index=0.4 which shows the odds ratio to be 9.0 and the Jaccard index to be 0.4. The print function also displays the contingency table which illustrates how the problem is formulated. Further, we want to see if H3K4me3 is associated with genes that are lowly expressed. We do > go.obj <- newgeneoverlap(hesc.chipseq.list$h3k4me3, + hesc.rnaseq.list$"exp Low", + genome.size=gs.rnaseq) > go.obj <- testgeneoverlap(go.obj) > print(go.obj) 3

Detailed information about this GeneOverlap object: lista size=13448, e.g. ENSG00000187634 ENSG00000188976 ENSG00000187961 listb size=8684, e.g. ENSG00000000005 ENSG00000000938 ENSG00000000971 Intersection size=2583, e.g. ENSG00000187583 ENSG00000187642 ENSG00000215014 Union size=19549, e.g. ENSG00000187634 ENSG00000188976 ENSG00000187961 Genome size=21196 # Contingency Table: nota ina notb 1647 10865 inb 6101 2583 Overlapping p-value=1 Odds ratio=0.1 Overlap tested using Fisher's exact test (alternative=greater) Jaccard Index=0.1 In contrast to the highly expressed genes, the P-value is now 1 with odds ratio 0.1. Therefore, H3K4me3 is not associated with gene suppression. Once a GeneOverlap object is created, several accessors can be used to extract its slots. For example > head(getintersection(go.obj)) [1] "ENSG00000187583" "ENSG00000187642" "ENSG00000215014" "ENSG00000142609" [5] "ENSG00000203301" "ENSG00000215912" > getoddsratio(go.obj) [1] 0.06418943 > getcontbl(go.obj) nota ina notb 1647 10865 inb 6101 2583 > getgenomesize(go.obj) [1] 21196 It is also possible to change slots "lista", "listb" and "genome.size" after object creation. For example > setlista(go.obj) <- hesc.chipseq.list$ > setlistb(go.obj) <- hesc.rnaseq.list$"exp Medium" > go.obj GeneOverlap object: lista size=3853 listb size=6011 Intersection size=1549 Overlap testing has not been performed yet. After any of the above slots is changed, the object is put into untested status. So we need to re-test it 4

> go.obj <- testgeneoverlap(go.obj) > go.obj GeneOverlap object: lista size=3853 listb size=6011 Intersection size=1549 Overlapping p-value=2.8e-69 Jaccard Index=0.2 We can also change the genome size to see how the p-value changes with it > v.gs <- c(12e3, 14e3, 16e3, 18e3, 20e3) > setnames(sapply(v.gs, function(g) { + setgenomesize(go.obj) <- g + go.obj <- testgeneoverlap(go.obj) + getpval(go.obj) + }), v.gs) 12000 14000 16000 18000 20000 1.000000e+00 9.999746e-01 6.039536e-05 9.006016e-24 5.589067e-51 As expected, the larger the genome size, the more significant the P-value. That is because, as the "universe" grows, it becomes less and less likely for two gene lists to overlap by chance. 3 Visualizing all pairwise overlaps When two gene sets each with one or more lists need to be compared, it would be rather inefficient to compare them manually. A matrix can be constructed, where the rows represent the lists from one set while the columns represent the lists from the other set. Each element of the matrix is a GeneOverlap object. To visualize this overlapping information altogether, a heatmap can be used. To illustrate, we want to compare all the gene lists from ChIP-seq with that from RNA-seq > gom.obj <- newgom(hesc.chipseq.list, hesc.rnaseq.list, + gs.rnaseq) > drawheatmap(gom.obj) 5

Color Key Odds Ratio 2 6 10 Value 0e+00 0e+00 N.S. N.S. 7e 13 N.S. N.S. 3e 69 1e 09 0e+00 N.S. N.S. H3K4me3 H3K36me3 Exp High Exp Medium Exp Low N.S.: Not Significant; : Ignored That is neat. The newgom constructor creates a new GeneOverlapMatrix object using two named lists. Then all we need to do is to use the drawheatmap function to show it. In the heatmap, the colorkey represents the odds ratios and the significant p-values are superimposed on the grids. To retrieve information from a GeneOverlapMatrix object, two important accessors called getmatrix and getnestedlist can be used. The getmatrix accessor gets information such as p-values as a matrix, for example > getmatrix(gom.obj, name="pval") H3K4me3 Exp High Exp Medium Exp Low 0.0000000 0.000000e+00 1.000000e+00 0.9996654 7.071538e-13 9.999497e-01 1.0000000 2.797009e-69 1.061420e-09 H3K36me3 0.0000000 9.999998e-01 1.000000e+00 or the odds ratios > getmatrix(gom.obj, "odds.ratio") 6

Exp High Exp Medium Exp Low H3K4me3 8.9672775 3.8589147 0.06418943 0.6366248 2.3460391 0.62254173 0.3338513 1.9408115 1.24113618 H3K36me3 10.9219512 0.8265542 0.02145576 The getnestedlist accessor can get gene lists for each comparison as a nested list: the outer list represents the columns and the inner list represents the rows > inter.nl <- getnestedlist(gom.obj, name="intersection") > str(inter.nl) List of 3 $ Exp High :List of 4..$ H3K4me3 : chr [1:5916] "ENSG00000188976" "ENSG00000187961" "ENSG00000187608" "ENSG00000188157".....$ : chr [1:66] "ENSG00000157184" "ENSG00000117133" "ENSG00000162694" "ENSG00000178104".....$ : chr [1:574] "ENSG00000188976" "ENSG00000188157" "ENSG00000162576" "ENSG00000228594".....$ H3K36me3: chr [1:3290] "ENSG00000188976" "ENSG00000188157" "ENSG00000160087" "ENSG00000127054"... $ Exp Medium:List of 4..$ H3K4me3 : chr [1:4985] "ENSG00000187634" "ENSG00000188290" "ENSG00000131591" "ENSG00000186891".....$ : chr [1:142] "ENSG00000143786" "ENSG00000197472" "ENSG00000135747" "ENSG00000196418".....$ : chr [1:1549] "ENSG00000187634" "ENSG00000188290" "ENSG00000131591" "ENSG00000186891".....$ H3K36me3: chr [1:1151] "ENSG00000157933" "ENSG00000116198" "ENSG00000171680" "ENSG00000162413"... $ Exp Low :List of 4..$ H3K4me3 : chr [1:2583] "ENSG00000187583" "ENSG00000187642" "ENSG00000215014" "ENSG00000142609".....$ : chr [1:90] "ENSG00000215912" "ENSG00000232423" "ENSG00000204501" "ENSG00000157358".....$ : chr [1:1745] "ENSG00000187583" "ENSG00000237330" "ENSG00000197921" "ENSG00000142611".....$ H3K36me3: chr [1:101] "ENSG00000171819" "ENSG00000116663" "ENSG00000187144" "ENSG00000215902"... Another important accessor is the method "[" that allows one to retrieve GeneOverlap objects in a matrix-like fashion. For example, > go.k4.high <- gom.obj[1, 1] > go.k4.high GeneOverlap object: lista size=13448 listb size=6540 Intersection size=5916 Overlapping p-value=0e+00 Jaccard Index=0.4 gets the GeneOverlap object that represents the comparison between H3K4me3 and highly expressed genes. It is also possible to get GeneOverlap objects using labels, such as > gom.obj["", "Exp Medium"] GeneOverlap object: lista size=297 listb size=6011 Intersection size=142 Overlapping p-value=7.1e-13 Jaccard Index=0.0 7

GeneOverlapMatrix can also perform self-comparison on one gene set. For example, if we want to know how the ChIP-seq gene lists associate with each other, we can do > gom.self <- newgom(hesc.chipseq.list, + genome.size=gs.rnaseq) > drawheatmap(gom.self) Color Key Odds Ratio 2 6 10 Value 5e 17 0e+00 0e+00 H3K4me3 N.S. 3e 28 N.S. H3K36me3 Only the upper triangular matrix is used. N.S.: Not Significant; : Ignored It is also possible to change the number of colors and the colors used for the heatmap. For example > drawheatmap(gom.self, ncolused=5, grid.col="blues", note.col="black") 8

Color Key Odds Ratio 2 6 10 Value 5e 17 0e+00 0e+00 H3K4me3 N.S. 3e 28 N.S. H3K36me3 N.S.: Not Significant; : Ignored or to show the Jaccard index values instead > drawheatmap(gom.self, what="jaccard", ncolused=3, grid.col="oranges", + note.col="black") 9

Color Key Jaccard Index 0 0.1 0.25 Value 5e 17 0e+00 0e+00 H3K4me3 N.S. 3e 28 N.S. H3K36me3 N.S.: Not Significant; : Ignored 4 Data source and processing The experimental data used here were downloaded from the ENCODE [ENCODE Consortium, ] project s website. Both ChIP-seq and RNA-seq samples were derived from the human embryonic stem cells (hesc). The raw read files were aligned to the reference genome using Bowtie [Langmead et al., 2009] and Tophat [Trapnell et al., 2009]. Cufflinks [Trapnell et al., 2010] was used to estimate the gene expression levels from the RNA-seq data. Only protein coding genes were retained for further analysis. The genes were filtered by FPKM status and only the genes with "OK" status were kept. This left us with 21,196 coding genes whose FPKM values were reliabled estimated. The genes were then separated into three groups: high (FPKM>10), medium (FPKM>1 and <=10) and low (FPKM<=1). The duplicated gene IDs within each group were removed before testing. 10

For ChIP-seq, one replicate from IP treatment and one replicate from input control were used. Peak calling was performed using MACS2 (v2.0.10) with default parameter values. The peak lists then went through region annotation using the diffreps package [Shen et al., 2013]. After that, the genes that had peaks on genebody and promoter regions were extracted. The genes were further filtered using the RNA-seq gene list obtained above. 5 SessionInfo > sessioninfo() R Under development (unstable) (2017-10-20 r73567) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS Matrix products: default BLAS: /home/biocbuild/bbs-3.7-bioc/r/lib/librblas.so LAPACK: /home/biocbuild/bbs-3.7-bioc/r/lib/librlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grdevices utils datasets methods base other attached packages: [1] GeneOverlap_1.15.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.13 gtools_3.5.0 digest_0.6.12 rprojroot_1.2 [5] bitops_1.0-6 backports_1.1.1 magrittr_1.5 evaluate_0.10.1 [9] KernSmooth_2.23-15 stringi_1.1.5 gplots_3.0.1 gdata_2.18.0 [13] rmarkdown_1.6 BiocStyle_2.7.0 RColorBrewer_1.1-2 tools_3.5.0 [17] stringr_1.2.0 yaml_2.1.14 compiler_3.5.0 catools_1.17.1 [21] htmltools_0.3.6 knitr_1.17 References [ENCODE Consortium, ] ENCODE Consortium. The encyclopedia of dna elements (http://encodeproject.org/encode/). [Langmead et al., 2009] Langmead, B., Trapnell, C., Pop, M., and Salzberg, S. (2009). Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biology, 10(3):R25. 11

[Shen et al., 2013] Shen, L., Shao, N.-Y., Liu, X., Maze, I., Feng, J., and Nestler, E. J. (2013). diffreps: Detecting differential chromatin modification sites from chip-seq data with biological replicates. PLoS ONE, 8(6):e65598. [Trapnell et al., 2009] Trapnell, C., Pachter, L., and Salzberg, S. L. (2009). Tophat: discovering splice junctions with rna-seq. Bioinformatics, 25(9):1105 1111. [Trapnell et al., 2010] Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J., and Pachter, L. (2010). Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotech, 28(5):511 515. 10.1038/nbt.1621. 12