for the article titled An atlas of human long non- coding RNAs with accurate 5 ends TABLE OF CONTENTS

Size: px
Start display at page:

Download "for the article titled An atlas of human long non- coding RNAs with accurate 5 ends TABLE OF CONTENTS"

Transcription

1 SUPPLEMENTARY Supplementary INFORMATION Information for article titled doi:.38/nature237 An atlas of human long non- coding RNAs with accurate 5 ends by Division of Genome Technologies, RIKEN, Yokohama, Japan (27) TABLE OF CONTENTS A Supplementary Notes... 2 Rationale and Implementation of TIEScore Benchmarking Performance of TIEScore... 3 TIEScore Cutoffs and TSS Validation Rates... 6 Definition of Genes, Coding Potential and Comparison with GENCODEv Directionality, Exosome Sensitivity and Transcript Properties Sequence Features at LncRNA TSS Supported by Different Types... B Online Resources... 2 Assembly, Expression Atlas and or Resources Overview of FANTOM CAT Browser Use Case : Download a List of Novel e- lncrnas... Use Case 2: Explore LncRNAs with Conserved Exons and Implicated in Use Case 3: Explore LncRNAs Enriched in Classical Monocytes Use Case : Explore Cell Types Associated with Crohn s Diseases... 9 C Supplementary Tables... 2 CAGE Library Information RNA- seq Library Information FANTOM CAT Gene Information... 2 Directionality, Exosome Sensitivity and Transcript Properties FANTOM CAT Genes in LncRNAdb Conservation of TIR and Exon Transposons at TIR Orthologous Transcription Expression Levels and Specificity in Primary Cell Facets... 2 Sample Ontology Information... 2 Gene Association with Cell Types Trait Information Gene Association with Traits... 2 Curation of Cell type and Trait Pairs Genes Involved in Cell Type and Trait pairs linked lncrna and mrna Pairs Gene- based Functional Evidence Grouping of Samples for Differential Expression Differential Expression Results... 2 D References

2 RESEARCH SUPPLEMENTARY INFORMATION A Supplementary Notes Rationale and Implementation of TIEScore a C = CAGE cluster expression level (cpm) CAGE criterion E = exon number total exon length (kb) Distance criterion exon Exon criterion exon D = distance from CAGE cluster to transcript (nt) Transformation based on linear regression with TSS validation rate by Roadmap as in b Weighting based on benchmark performance (ROC-AUC) against gold standards as in c TIEScore as an empirical estimation of TSS validation rate by TIEScore = w e f e (E) + w d f d (D) + w c f c (C) w e =. f e (E) = 5.5 log 2 E w d =.55 f d (D) =.9 D w c =.35 f c (C) = 6.63 log 2 C b Transformation c Weighting linear scale square root scale log 2 scale increasing performance CAGE cluster TSS validation rate by (%) r 2 =.3 r 2 =.69 f e (E) = 5.5 log 2 E r 2 =.96 Exon criterion (E) TIEScore performance (ROC-AUC) Performance CAGE cluster TSS validation rate by (%) transcript TSS validation rate by (%) Exon criterion (E) in various scales f d (D) =.9 D r 2 =.89 r 2 =.98 r 2 = Distance criterion (D) in various scales f c (C) = 6.63 log 2 C r 2 =.7 r 2 =.89 r 2 = CAGE criterion (C) in various scales actual TSS permutated TSS Distance criterion (D) CAGE criterion (C) weight of f c (C) weight of f d (D) weight of f e (E) w e =. w d =.55 w c = rank of combinations (n=86) of relative weights, sorted by TIEScore performance (ROC-AUC) Exon criterion (E) Distance criterion (D) CAGE criterion (C) Supplementary Fig. Rationale and Implementation of TIEScore. a, For each pair of transcript and CAGE cluster, TIEScore is calculated as sum of values from exon (E), distance (D) and CAGE (C) criteria, which were transformed by fe (E), fd (D) and fc (C) as in b and weighted by we, wd and wc as in c, respectively. The resulting TIEScore can be interpreted as an empirical estimation of TSS validation rate by. b, Transformation of TIEScore criteria values. We investigated correlation between TIEScore criteria values and TSS validation rate by DNaseI hypersensitivity sites (). The criteria values were transformed into various scales (columns) and plotted against TSS validation rate by. A TSS is validated if it overlaps a. Within each bin of values at X- axis, percentage of validated TSS was calculated (i.e. TSS validation rate). Same number of positions was randomly sampled from unannotated genomic regions (permutated TSS). Linear regression (red line, 99.99% confidence intervals) was performed on actual TSS and r 2 was indicated. The plot with highest r 2 within each TIEScore criteria was highlighted in pink and corresponding linear regression functions (i.e. fe (E), fd (D) and fc (C)) were used to transform TIEScore criteria values. c, Weighting of TIEScore criteria values. TIEScore is defined as weighted sum of transformed TIEScore criteria values. 2

3 SUPPLEMENTARY INFORMATION RESEARCH. TIEScore Rationale: Transformation and Weighting of Criteria Values Transcription Initiation Evidence Score (TIEScore) is a custom metric that evaluates properties of a pair of CAGE cluster and transcript model to quantify likelihood that corresponding CAGE transcription start site (TSS) being genuine (Supplementary Fig. a). We reasoned lowly abundant transcripts are likely to be insufficiently covered by RNA-seq reads and thus ir transcript models are more likely to be incomplete and truncated into shorter and less spliced models. This is supported by observation that product of exon number and total exon length (kb) of transcript models (i.e. Exon criterion, E) are highly correlated with percentage of TSS validated by (i.e. validation rate, Supplementary Fig. b). Also, lowly abundant CAGE clusters that are distant from transcript 5 ends may represent degradation products of longer transcripts, previously referred to as exon painting 2. This is supported by observation that expression levels of CAGE clusters and ir distances to closest transcript model (i.e. CAGE criterion, C and Distance criterion, D) are correlated with validation rate (Supplementary Fig. b-c). We refore devised TIEScore (Supplementary Fig. a), a metric integrating se criteria to address inaccurately identified 5 ends of transcripts and build transcript models with well-supported transcription initiation evidence. First, we explored linearity of validation rate across se criteria values using various scales based on linear regression (Supplementary Fig. b). For each criterion, scale that yields highest r 2 was chosen (Supplementary Fig. b) and its value was n transformed into validation rate based on corresponding linear regression functions (i.e. f e (E), f d (D) and f c (C)) as in Supplementary Fig. b. TIEScore is defined as weighted sum of se transformed values. To optimize weights of se criteria, we evaluated performance of TIEScore for discrete combinations of weights (n=86 combinations, i.e. three weights at grid of. with sums equal to, Supplementary Fig. c). The performance of TIEScore was measured in terms of area under receiver operating characteristic (ROC) curve (ROC-AUC) 3 in 7 matched CAGE and RNA-seq libraries (see details in Supplementary Note 2) and combination of weights which yielded highest AUC-ROC was chosen (i.e. w e, w d and w c ). TIEScore is thus calculated as: w e f e (E) + w d f d (D) + w c f c (C), which can be interpreted as an empirical estimation of TSS validation rate by..2 Implementation of TIEScore in Meta-assembly of FANTOM CAT TIEScore was evaluated for each pair of CAGE clusters and transcript models within kb. Each transcript model was assigned to CAGE cluster yielding highest TIEScore and its 5 end was adjusted to most prominent TSS of CAGE cluster. Transcripts with no CAGE peaks within kb, and CAGE peaks without transcripts within kb, were refore discarded. This associated each retained transcript with a CAGE cluster, and each retained CAGE cluster with one or more transcripts. A CAGE cluster is n assigned highest TIEScore yielded from its associated transcripts. TIEScore was first applied to each of five transcript model collections separately (with,897 CAGE libraries, Supplementary Table ) and n merged into a non-redundant transcript set (referred to as raw FANTOM CAGE associated transcriptome (CAT)). Specifically, transcript models from GENCODEv9 were used as initial reference to sequentially overlay onto m transcripts from or four collections, in sequence of Human BodyMap 2., mitranscriptome 5, ENCODE 6 and FANTOM5 RNA-seq assembly (this study). In each overlay, query transcript models that share ) 5 ends within ±5bp, 2) exact same splicing junctions, 3) 3 ends within ±2,bp, with reference transcript models were discarded as redundant. The non-redundant query transcript models were n added as new reference set for next round of overlay. For each CAGE cluster, maximum TIEScore after all four rounds of overlays was taken as its TIEScore. The TIEScore of raw FANTOM CAT was benchmarked against gold standards (with N=5, see details in Supplementary Note 2). The TIEScore cutoffs were chosen as in Supplementary Note

4 RESEARCH SUPPLEMENTARY INFORMATION 2 Benchmarking Performance of TIEScore gold standard TSS regions with various degrees of chromhmm state support gold standard non-tss regions relaxed strict > >2 >3 > >5 >6 >7 >8 >9 > 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,..75 all CAGE clusters 5, fraction of regions covered by DNase-seq a relative position from peak of CAGE clusters (bp) recall RNA-seq read counts cutoff = 27. CAGE clusters with less than.86 read counts, rescued by TIEScore 89 37, log2 RNA-seq read count cutoff,66 cutoff = cutoff = FDR = log2 CAGE read count cutoff log2 RNA-seq read count cutoff. cutoff = FDR = TIEscore cutoff F-measure mean = , TIEscore cutoff.5.75 recall FDR = F-measure mean =.2. 2, TIEScore cutoff = 52.9 PR-AUC mean =.92 F-measure mean =.9 F-measure PR-AUC mean = recall. d false discovery rate c TSS identified at false discovery rate of.5 by three classifiers precision PR-AUC mean =.98 e CAGE read count RNA-seq read count,333 TIEScore.75 b log2cage read count cutoff CAGE read counts cutoff =.86 Supplementary Fig. 2 Benchmarking Performance of TIEScore. Seventy samples with matched CAGE and RNA-seq libraries were used for TIEScore benchmarking (Supplementary table ). a, coverage of gold standard TSS regions. All CAGE clusters (st column) refer to all CAGE clusters in raw FANTOM CAT. Gold standard non-tss regions (2nd column) and TSS regions (3rd column) were defined using chromatin states among 27 epigenome datasets from Roadmap Epigenomics Consortium7 and CAGE clusters identified in FANTOM58. Gold standard TSS regions were defined at various degrees of chromatin state support (from relaxed to strict, 3rd to last column). Y-axis: fraction of corresponding regions covered by DNaseseq peaks. Line and shaded area: signal summarized per window of 2bp, solid line and shaded area represent median and quartiles of 27 epigenome datasets at corresponding window. In b, c and d, benchmarking of performance of TIEScore, CAGE read count and RNA-seq read count was repeated with sets of gold standard TSS regions defined at various levels of stringency as in a with same color scale. b, Precision and recall curve (PR curve). The area under PR curve (PR-AUC)3 is used as a measure of classifier performance in identifying genuine TSS. PR-AUC3 of TIEScore is significantly higher than that of or 2 classifiers across various stringencies of gold standard TSS definition (P<.5, paired Student s t-test). In c and d, vertical dashed lines: classifier cutoffs with false discovery rate (FDR) of.5 at gold standard TSS>5 chromatin state support. c, F-measure versus classifier cutoffs. F-measure3 was calculated as harmonic mean of precision and recall and is used as a measure of classifier accuracy in identifying genuine TSS. At FDR of.5 (vertical dashed lines), TIEScore outperforms (higher F-measure values) or two classifiers. d, FDR versus classifier cutoffs. Intersections of curve and horizontal dashed lines refer to classifier cutoffs for achieving FDR of.5 (vertical dashed lines). e, Overlap of TSS identified by three classifiers at cutoffs for achieving FDR of.5. W W W. N A T U R E. C O M / N A T U R E

5 SUPPLEMENTARY INFORMATION RESEARCH 2. Definition of gold standard TSS and non-tss regions To assess how well TIEScore identifies genuine TSS compared to using CAGE or RNA-seq derived information alone, we first defined sets of gold standard TSS and non-tss regions using epigenome datasets from Roadmap Epigenomics Consortium 7 ( Firstly, chromatin states TssA (Active TSS), Enh (Enhancers), TssBiv (Bivalent/Poised TSS) and EnhBiv (Bivalent Enhancer) were referred to as pro-tss-states ; and chromatin states Tx (Strong transcription), Het (Heterochromatin) and Quies (Quiescent/Low) were referred to as non-tss-states. A CAGE cluster was defined as a gold standard TSS region when its prominent TSS overlaps a pro-tss-state in N of 27 epigenomic datasets 7 and overlaps no non- TSS-states, where N reflects stringency of gold standard definition. Conversely, a CAGE cluster was defined as a gold standard non-tss region when its prominent TSS overlaps a non-tss state in at least 2 of 27 epigenome datasets 7 and overlaps no pro-tss-states. These gold standard regions were n used to assess TIEScore performance in distinguishing TSS from non-tss regions (Supplementary Fig. 2a). 2.2 Performance of TIEScore versus CAGE or RNA-seq data alone Using 7 samples with matched CAGE and RNA-seq libraries (Supplementary Table 2), performance of TIEScore in identifying genuine TSS was compared against CAGE or RNA-seq data alone, in identification of genuine TSS (Supplementary Fig. 2). First, TIEScore was applied to a subset of CAGE clusters with at least 3 reads (sum among 7 samples) and FANTOM5 RNA-seq assembly. A set of universal TSS regions (n=,,) was n generated across three sets of TSS (i.e. combined, and CAGE or RNA-seq alone) by merging transcripts 5 ends in FANTOM5 RNA-seq assembly within ±nt into RNA-seq TSS regions and n by merging se RNA-seq TSS regions with CAGE clusters within ±nt. Gold standard TSS and non-tss regions for se universal TSS regions were defined using midpoint of se universal TSS regions instead of prominent TSS in CAGE clusters. Ten sets of gold standard TSS regions were defined at various stringencies as described above, with N= to, at step of. For each of universal TSS regions, its support (i.e. score) from each of three datasets was n calculated as, ) combined: maximum TIEScore of all associated CAGE clusters; 2) CAGE only: maximum number of CAGE reads of all associated CAGE clusters; 3) RNA-seq only: maximum number of RNA-seq reads of all associated RNA-seq TSS regions. (Note: each RNA-seq TSS region is represented by sum of RNA-seq reads of its associated transcripts as estimated in Sailfish 2 ). The score performance for each of three TSS datasets was assessed using R packages ROCR 3 and PRROC 3. Specifically, sensitivity, specificity, precision, recall, FDR and F-measure at various score cutoffs were calculated using ROCR 3, and ROC-AUC and PR-AUC were calculated using PRROC 3. In terms of precision, recall and F-measure (Supplementary Fig. 2b-c), TIEScore substantially outperformed both CAGE only and RNA-seq only based approaches. In Supplementary Fig. 2d, cutoffs values of various classifiers at FDR of.5 were indicated. In Supplementary Fig. 2e, RNA-seq read count identified appreciably fewer TSS than or two classifiers, due to its high cutoffs (i.e reads) to achieve FDR of.5. Number of TSS identified based on TIEScore and CAGE read count is comparable. TIEScore rescued 35,83 TSS discarded by CAGE read count classifier cutoff at reads, implying its ability to identify lowly expressed TSS with acceptable FDR by synergizing information from CAGE and RNA-seq data. 5

6 RESEARCH SUPPLEMENTARY INFORMATION 3 TIEScore Cutoffs and TSS Validation Rates a c sensitivity cumulative percentage of TSS (%) % specificity=.83 robust TIEScore cutoff 53.% 8.6% sensitivity=.82 ROC-AUC= specificity b d accuracy or false discovery rate TSS validation rate by (%) permissive (35.3) FDR=.23 robust (5.) FDR=.77 stringent (6.) FDR=.26 accuracy FDR TIEScore actual TSS permutated TSS Pearson s r.9 e TSS validation rate by RAMPAGE (%) validated by RAMPAGE within X bp X = Pearson s r mean = TIEScore TIEScore TIEScore Supplementary Fig. 3 TIEScore Cutoffs and TSS Validation Rates. a. Robust cutoff of TIEScore. The robust TIEScore cutoff (TIEScore=5.) was based on optimal cutoff determined from a ROC curve, with FDR=.77. b, Permissive and stringent cuttofs of TIEScore. c, Cumulative percentage of TSS. % refers all TSS in raw FANTOM CAT. d, FANTOM CAT TSS validation rate by. Percentages of TSS at various TIEScore (bin=.) overlaps with were calculated. Permutated TSS refers to randomly sampled genomic positions as control. e, TSS validation by RAMPAGE. The percentages of TSS at various TIEScore (bin=.) that could be validated by RAMPAGE at various ranges were calculated and loess-smood curves were plotted. In c-e, vertical dashed lines represent three TIEScore cutoffs as in b. 3. Definition of TIEScore cutoffs Based on a ROC curve (Supplementary Fig. 3a), we derived an optimal TIEScore cutoff (TIEScore=5., referred to as robust cutoff), with ROC-AUC=.89 and FDR=.77. We also defined two additional values (Supplementary Fig. 3b), permissive (TIEScore=35.3) and stringent (TIEScore=6.) cutoffs, based on FDR chosen as three times more relaxed (FDR=.23) and strict (FDR=.26) than robust FDR. 3.2 Properties at various TIEScore cutoffs At robust cutoff, 6.9% of all TSS were retained (Supplementary Fig. 3c). We observed high correlations between TIEScore and TSS validation rate by (Pearson s r=.9, Methods, Supplementary Fig. 3d) and by RAMPAGE (mean Pearson s r=.95, Methods, Supplementary Fig. 3e), suggesting that TIEScore quantitatively identifies genuine TSS. 6

7 SUPPLEMENTARY INFORMATION RESEARCH Definition of Genes, Coding Potential and Comparison with GENCODEv a support in classes promoter enhancer dyadic no Gene classes uncertain coding Number of genes in classes lncrnas mrnas ors Levels of support in FANTOM CAT CAT genes, above stringent cutoff CAT genes, between robust and stringent cutoffs CAT genes, between permissive and robust cutoffs b c Stringent cutoff 3,52 genes Robust cutoff 59, genes Permissive cutoff 2,25 genes 5 75 percentage of genes (%) 5 75 percentage of genes (%) structural RNAs small RNAs short ncrnas sense overlap RNAs pseudogenes lncrna, sense intronic lncrna, intergenic lncrna, divergent lncrna, antisense protein coding mrnas uncertain coding structural RNAs small RNAs short ncrnas sense overlap RNAs pseudogenes lncrna, sense intronic lncrna, intergenic lncrna, divergent lncrna, antisense protein coding mrnas uncertain coding structural RNAs small RNAs short ncrnas sense overlap RNAs pseudogenes lncrna, sense intronic lncrna, intergenic lncrna, divergent lncrna, antisense 5,, 5, 2,, 3, 35, number of genes (count) 5,, 5, 2,, 3, 35, number of genes (count) d f number of GENCODEv genes (count) 5,, 5, 2, total bp bp 5 bp bp bp density.5.5 below permissive cutoff lncrnas mrnas pseudogenes GENCODEv gene classes e number of novel FANTOM CAT genes (count), 2, 3,, 5, lncrnas mrnas ors FANTOM CAT gene classes GENCODEv transcripts with 5 end revised by FANTOM CAT lncrna protein coding pseudogene 26,578 % 5,926 % 7,66 % 6,9 6% 28,27 88% 8, 6% 2,82 8% 93,788 6% 7,376 2% 7,6 27% 5,33 3%,867 28% 5,9 9% 29,73 2% 3,56 2% 3,33 %,963 %,97 % mean 35.9 mean 95. mean 8.7 protein coding mrnas 5 75 percentage of genes (%) 5,, 5, 2,, 3, 35, number of genes (count) relative modified 5 end position (bp) Supplementary Fig. Definition of FANTOM CAT Genes and Comparison with GENCODEv. In a-c, Left column: Percentage of genes within each gene class supported by different types of Roadmap regulatory regions 7, based on overlapping of ir strongest TSS with Roadmap 7. At all cutoffs, majority of protein coding mrnas are supported by promoter. Right column: Number of genes within each gene class. a, b and c, refer to FANTOM CAT genes at stringent, robust and permissive TIEScore cutoff respectively. d, GENCODEv genes in FANTOM CAT. The number of GENCODEv lncrna, mrna and pseudogene genes supported at various levels of FANTOM CAT was plotted. e, FANTOM CAT genes novel to GENCODEv. A FANTOM CAT gene is novel to GENCODEv if none of its transcripts is compatible with a GENCODEv transcript. f, Revision of 5 ends of GENCODEv transcripts. Upper: Number of GENCODEv transcripts revised by various extents. Total refers to total number of GENCODEv transcripts. X bp refers to number of GENCODEv transcripts revised by at least X bp. Lower: Distributions of 5 end of GENCODEv lncrna, mrna and pseudogene transcripts relative to ir compatible FANTOM CAT transcripts (i.e. relative modified 5 end position) were plotted. The absolute mean within each class was indicated.. Definition FANTOM CAT genes FANTOM CAT genes were defined based on clustering of transcript models in raw FANTOM CAT (after reducing complexity) using a custom perl script. Specifically, non-gencodev9 transcripts with exon boundaries within ±5bp to that of a GENCODEv9 transcript were first assigned to corresponding GENECODEv9 genes. Single exon non-gencodev9 transcripts within ±5bp of a GENCODEv9 exon were also assigned to corresponding GENECODEv9 gene. The non-gencodev9 transcripts spanning multiple GENECODEv9 genes were defined as chimeric transcripts and removed (as likely assembly artifacts). Remaining unassigned non-gencodev9 transcripts with exon boundaries within ±5bp were n recursively clustered, and all se resulting transcript clusters were referred to as novel genes outside GENCODEv9. Applying various TIEScore cutoffs to raw FANTOM CAT, we defined 2,25 permissive, 59, robust and 3,52 stringent genes (Supplementary Fig. a-c). Gene classes of 7

8 RESEARCH SUPPLEMENTARY INFORMATION FANTOM CAT genes assigned to GENCODEv9 protein coding genes, pseudogenes and small RNA genes were directly inherited from ir biotypes. Or gene classes in FANTOM CAT were defined as follows..2 Definition of gene classes Coding potential of all transcripts in raw FANTOM CAT was evaluated using Coding Potential Assessment Tool (CPAT, version ) with default parameters on hg9. Transcripts with CPAT score <.36 and no open reading frames (ORF) 3nt (based on getorf 6 ) are defined as non-coding. A FANTOM CAT gene is defined as non-coding if all its transcripts are non-coding, or its GENCODEv9 biotype is annotated as non-coding 7. A gene is defined as coding when at least 5% of its transcripts are coding and re is at least one transcript with an ORF 3nt. Orwise gene is classified as coding uncertain. Non-coding genes generating at least one transcript with total exonic length 2nt are defined as lncrna genes, and those <2nt are defined as short ncrna. LncRNAs are n classified in ascending order as follows: divergent lncrnas, sense intronic lncrnas, antisense lncrnas and intergenic lncrnas. Divergent lncrnas: genes with its strongest CAGE cluster within ±2kb on opposite strand of any CAGE clusters of GENCODEv9 protein coding genes or pseudogenes. Sense intronic lncrna: lncrna genes ) initiating within intron of anor FANTOM CAT gene, 2) with at least 5% of ir genic region overlapping with genic region of any anor genes, 3) with its strongest CAGE cluster not overlapping exons of or genes, and ) containing CAGE reads, or orwise defined as or sense overlap RNA. Antisense lncrnas: genes with 5% of ir genic region overlapping with genic region of GENCODEv9 protein coding genes or pseudogenes on opposite strand. Intergenic lncrnas: remaining lncrna genes that could not be assigned to any of above categories..3 Comparison between FANTOM CAT and GENCODEv A GENCODEv gene is supported by FANTOM CAT if at least for one of its transcripts is compatible with a FANTOM CAT transcript. A GENCODEv transcript is defined as compatible with FANTOM CAT transcripts if all of its exon boundaries are within ±5bp to that of a FANTOM CAT transcript, and vice versa. A FANTOM CAT gene is defined as novel if none of its transcripts is compatible to GENCODEv transcripts. About 33.86% (5,3 of,8) of GENCODEv lncrna genes (grey stack, Supplementary Fig. d) are below permissive TIEScore cutoff. At robust cutoff, FANTOM CAT covered 8,269 of 2,32 protein-coding genes in GENCODEv (Supplementary Fig. d). The robust FANTOM CAT covered 6,99 of,8 lncrna genes in GENCODEv (Supplementary Fig. d) and added 9,723 novel lncrna genes outside of GENCODEv (Supplementary Fig. e). In addition, 5 ends of 6,9 lncrna transcripts in GENCODEv were revised using FANTOM CAT models, which resulted in ir improved accuracy by a mean of 35.9bp (Supplementary Fig. f).. Annotation of open reading frames in FANTOM CAT. As some previously annotated putative lncrnas transcripts have been shown to associate with ribosomes and thus may encode for short peptides 8, we furr annotated coding potential of open reading frames, but were not used for defining coding status of a FANTOM CAT gene. Coordinates of ORFs on all FANTOM CAT transcripts (n=86,58) were extracted using getorf 6, with minimum 3nt requiring both start and stop codons. Only top 5 longest ORFs per transcript were retained, which yielded 3,6,858 non-redundant ORFs. Coordinates of ORFs were converted from transcript level to genomic level. A pre-computed 6-way whole genome alignment 9 of hg9 was downloaded from UCSC Genome Browser 2. Only 27 species were retained in alignment based on its intersection with 58-mammal phylogeny used in PhyloCSF 2, including Alpaca, Armadillo, Baboon, Bushbaby, Cat, Chimp, Cow, Dog, Dolphin, Elephant, Gorilla, Guinea pig, Hedgehog, Horse, Human, Marmoset, Megabat, Microbat, Mouse, Orangutan, Pika, Rabbit, Rat, Rhesus, Shrew, Squirrel and Tenrec. Interspecies alignments of ORF were extracted using mafsinregion from UCSC Genome Browser 2. Coding potential of se ORFs were assessed using PhyloCSF 2 and RNAcode 22 using default settings, based on phylogenetic information from interspecies alignments. We furr identified potentially translated small ORFs (sorf) based on ribosome profiling data in sorfs.org 23. An ORF is defined as coding if its score in PhyloCSF 2, P<. in RNAcode 22 or it matches an sorf with good floss-score as defined in sorfs.org 23. The annotations of all ORFs in FANTOM CAT are available at Note: Of 27,99 lncrnas genes in robust FANTOM CAT, 7,722 of m were annotated as lncrnas in GENCODEv9 2 and remaining 2,97 non-gencodev9 lncrna genes where all ir transcripts lacked coding potential (CPAT score <.36 and no ORF 3nt). Of se 2,97 non-gencodev9 lncrna genes, only,87 had putative ORFs with additional evidence of coding potential (in eir PhyloCSF 2, RNAcode 22 or sorfs.org 23 ). We provide se annotations on web resource ( yet we still consider se as genuine lncrnas for furr analysis. 8

9 5 Directionality, Exosome Sensitivity and Transcript Properties SUPPLEMENTARY INFORMATION RESEARCH 5 Directionality, Exosome Sensitivity and Transcript Properties a a directionality = (read ss read as ) / (read ss + read as ) directionality = (read ss read as ) / (read ss + read as ) sense CAGE sense TSS CAGE (read TSS antisense ss ) (read antisense ss ) CAGE TSS CAGE (read as TSS ) (read as ) CAGE cluster miscellaneous CAGE cluster miscellaneous in question in question region for read counting region for read counting distance from sense TSS (bp) distance from sense TSS (bp) b b c c d d e percentage of genes of genes exosome sensitivity genomic span span (log2 (log2 kb) kb) splicing index mrna divergent p-lncrna intergenic p-lncrna e-lncrna mrna divergent p-lncrna intergenic p-lncrna e-lncrna directionality (strand-bias (strand-bias of CAGE of CAGE signal) signal) Supplementary Fig. 55 Directionality, Exosome Sensitivity and Transcript Properties. a, a, Definition of of directionality. Miscellaneous refers to to any antisense TSS (if any). Directionality of a CAGE cluster is is defined as: as: (readss readas)/(readss+readas), where readss and readas are number of CAGE read counts (across,897 FANTOM5 samples) within 8 8 to +2nt to +2nt of of its its prominent TSS on on sense and antisense strand, respectively. b percentages of of genes within each each category for for each each bin bin of of directionality. c, c, Exosome sensitivity is measured as relative fraction of of CAGE signal observed after after exosome knockdown in in HeLa-S3 cells cells. d,. d, genomic span, defined as length between 5 5 most to to 3 3 most base of of a a gene. gene. e, e, splicing index, index, calculated as as average of of sum of of ratio of of junction spanning reads to to spliced reads of of each of of its its introns across across 7 7 RNA-seq RNA-seq libraries. libraries. The The box box plots plots show show median, quartiles and Tukey whiskers of measurements for for genes genes at at each each bin bin of of directionality. 5. Rationale Transcription initiation is intrinsically bidirectional 26 (Supplementary Fig. 5a). Functionally distinct RNA species were previously categorized by ir transcriptional directionality (strand-bias of transcription initiation, referred to as directionality) and by exosome-sensitivity (sensitivity to ribonucleolytic RNA exosome complex). For each lncrna category we examined relationship between se features and length and splicing of observed transcripts (Supplementary Fig. 5b-e). 5.2 Calculation of directionality, splicing index, genomic span and exosome sensitivity Directionality of a given CAGE cluster is measured as bias of CAGE signal on sense versus antisense strand (relative to TSS), as described with modifications as follows. A zero value implies perfectly balanced CAGE signal on both strands, and values of + and implies strongly biased CAGE signal towards sense or antisense strand, respectively. Directionality of a CAGE cluster is defined as follows: (read SS read AS )/(read SS +read AS ), where read SS and read AS are number of read counts from all FANTOM5 CAGE datasets (n=,897, Supplementary Table ) within 8 to +2nt of its prominent TSS on sense and antisense strand, respectively (Supplementary Fig. 5a). Directionality of a gene is defined by directionality of its strongest CAGE cluster. Splicing index of a gene is defined as average of sum of ratio of junction spanning reads to spliced reads of each of its introns across 7 RNA-seq libraries (libraries mentioned in Methods). Genomic span of a gene is defined as length between its 5 most to 3 most genomic location. Exosome sensitivity of a CAGE cluster is measured as relative fraction of CAGE signal observed after exosome knockdown in HeLa-S3 cells as previously described. 5.2 Directionality p-lncrnas Consistent with previous findings, transcription initiation regions (TIRs) that are predominantly transcribed from sense strand (directionality +, in all four categories, Supplementary Fig. 5b) produce less exosome-sensitive (Supplementary Fig. 5c), generally longer (Supplementary Fig. 5d) and more spliced and RNAs (Supplementary Fig. 5e). Therefore, directionality of TIR somewhat reflects properties of its produced RNAs. As expected, TIRs of mrnas predominantly generate sense transcripts (directionality +, red) and are only mildly exosome sensitive, 9 9

10 RESEARCH SUPPLEMENTARY INFORMATION whereas p-lncrnas generated from divergent antisense transcription from se mrna promoters are PROMPT 27 like (directionality, purple), largely exosome sensitive, short and rarely spliced. In contrast intergenic p-lncrna TIRs are mostly biased towards generating sense transcripts (directionality >, blue). In addition, ~% of intergenic p- lncrnas predominantly generating sense transcripts (directionality +, blue), such as MALAT 28, are relatively resistant to exosome degradation. 5.3 Directionality e-lncrnas About 76% of e-lncrnas arose from balanced bidirectional TIRs (directionality.8 and +.8, green), while about 22% were generated from unidirectional TIRs (directionality >+.8, green). These unidirectional TIR derived e- lncrnas are generally less exosome sensitive, longer, more spliced (directionality >+.8, green). This study refore expand our previous catalog of bidirectionally transcribed enhancer regions 8 by including unidirectional e-lncrnas. These contain previously identified functional e-lncrna such as CCAT, an unidirectionally transcribed from a superenhancer, which promotes long-range chromatin looping and regulates MYC transcription 29.

11 SUPPLEMENTARY INFORMATION RESEARCH 6 Sequence Features at LncRNA TSS Supported by Different Types a lncrna, sense intronc lncrna, intergenic lncrna, divergent lncrna, antisense random controls n=6 n=,26 n=26 n=2, n=,8 n=6, n=,7 n=,33 n=522 n=936 n=5,827 n=92 n=26 n=,376 n=22 n=,8 n=, n=, dyadic enhancer promoter no dyadic enhancer promoter no dyadic enhancer promoter no dyadic enhancer promoter no unannot. regions whole genome 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2, ,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2, ,5 2, ,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 TATA box (fraction) INR motif (fraction) ,5 2,5 2,5 2, SS (fraction) PAS (fraction) CpG island (fraction) RS score (mean) H3Kme3 (fraction)..75 H3Kme (fraction) H3K27ac (fraction)..75 DNase (fraction) b relative position from TSS (bp) c relative position from TSS (bp) relative position from TSS (bp) Supplementary Fig. 6 Sequence Features at LncRNA TSS Supported by Different Types. Black dashed lines: whole genome background. Solid lines: TSS of lncrna genes and control regions randomly sampled from whole (whole genome) or unannotated portion of genome (unannot. regions). Colors of lines correspond to type of support as indicated on top. a, Epigenomic features surrounding TSS. Distributions of Roadmap DNase-seq and ChIP-seq (Methods) were plotted. b, Genomic features surrounding TSS. Distributions of rejected substitution (RS) score 3, CpG island, polyadenylation site signal (PAS) and 5 splicing site (5 SS), were plotted. c, Core promoter motifs. Distributions of TATA box and initiator (INR) motif, were plotted.. 6. Genuineness TSS of lncrna gene classes supported by different types Despite relatively weaker selective constraints of e-lncrnas and intergenic p-lncrnas TIR as measured using RS scores 3, we observed an enrichment of sequence features associated with regulated transcription initiation, including TATA-box, INR and CpG island (absent for e-lncrnas) as well as signals conducive to generating long transcripts including 5 SS enrichment and PAS depletion. These sequence features suggest that at least a subset of se TIR have undergone selection for both transcription initiation and elongation. For remaining lncrnas lacking support (grey), we still observed noticeable enrichment of 5 SS, INR and TATA-box motifs and depletion of PAS, suggesting that a subset of se TSS is likely to be genuine despite lack of support.

12 RESEARCH SUPPLEMENTARY INFORMATION B Online Resources All resources mentioned below are available at Assembly, Expression Atlas and or Resources. Assembly: Cutoffs and identifiers We provide The FANTOM CAT assembly at cutoffs (as *.gtf), Raw: without TIEScore cutoff, FDR=.285, contains % of all TSS Permissive: TIEScore 35.3, FDR=.23, contains 88.6% of all TSS Robust: [optimal, recommend to use] TIEScore 5., FDR=.77, contains 6.9% of all TSS Stringent: TIEScore 6., FDR=.26, contains 9.% of all TSS At each of TIEScore cutoffs, assembly consists of 3 types of basic items (as *.bed), ) genes, 2) transcripts and 3) CAGE clusters. Each of se items has a unique identifier and associates to items of or types as specified in ID mapping tables (as *.txt): ONE transcript associates with ONLY ONE CAGE cluster; ONE transcript associates with ONLY ONE gene; ONE CAGE cluster can associate with multiple transcripts; ONE CAGE cluster can refore associate with multiple genes; Gene identifiers (mainly inherited from GENCODEv9): ENSG*.*: inherited from GENCODEv9, e.g. ENSG577.9; CATG*.*: novel genes from FANTOM CAT, e.g. CATG557.; Transcript identifiers (mainly inherited from GENCODEv9): ENST*.*: inherited from GENCODEv9 7, e.g. ENST57292.; ENCT*.*: novel transcripts from ENCODE 6 transcript models, e.g. ENCT868.; HBMT*.*: novel transcripts from Human BodyMap v2. transcript models, e.g. HBMT386.; MICT *.*: novel transcripts from mitranscriptome 5 transcript models, e.g. MICT969.; FTMT*.*: novel transcripts from FANTOM5 RNA-seq transcript models, e.g. FTMT96.; CAGE Cluster identifiers (ALL inherited from FANTOM5 DPI CAGE Clusters ): chr*:*..*,*: as chromosome:start..end,strand, inherited from FANTOM5, e.g. chr: ,+;.2 Expression Atlas At each of TIEScore cutoffs, we provide following expression tables (,897 FANTOM5 CAGE libraries): tag count per CAGE cluster: CAGE tags count within flanking region (±5nt) of CAGE cluster; tag count per gene: sum of CAGE tags of CAGE clusters associated with gene; rle cpm per CAGE cluster: rle normalized (by edger 3 ) cpm (count per millions) CAGE cluster; rle cpm per gene: sum of CAGE tags of CAGE clusters associated with gene;.3 Or Resources We also provide following resources for FANTOM CAT as robust TIEScore cutoff: Differential expression: differential expression analysis (edger 3 ) of manually curated series; Sample ontology association: gene-based association with FANTOM5 sample ontology terms; Trait association: Gene-based association with traits in GWASdb 32 and PICS 33 ; Expression specificity: Gene-based expression levels and specificity in primary cell facets; Co-expression: Co-expression of lncrna mrna pairs linked by GTEx 3 -associated SNPs; Selective constraints: RS score and GERP elements 3 at TIR and exon; Orthologous TSS activities: Orthologous TSS activities in mouse, rat, dog and chicken; ORF Annotations: coding potential of all ORF based on phylocsf 2, RNACode 22 and sorf.org 23 ; Transposable elements: TIR association with transposable elements; Directionality: Transcriptional directionality of TIR based on,897 FANTOM5 CAGE libraries; Exosome Sensitivity: Exosome sensitivity based on exosome knockdown in HeLa cells ; Splicing Index: Gene-based splicing index in 7 RNA-seq libraries; 2

13 SUPPLEMENTARY INFORMATION RESEARCH 2 Overview of FANTOM CAT Browser Supplementary Fig. 7 Landing Page of FANTOM CAT Browser. The FANTOM CAT Browser is hosted at This web application allows users to browse FANTOM CAT genes, view ir genomic loci through ZENBU 35, filter by ir annotations, intersect m with ir associated sample ontologies or traits, and download relevant data. Alternatively, users can also browse by trait or sample ontologies, and intersect with ir associated genes. Click on circles below and start browsing Genes, Traits and Ontologies. Gene List Page Gene Individual Page Supplementary Fig. 8 Browsing Genes from Gene List and Individual Page. On Gene List Page ( users can select for genes based on filters on left if list. Click on GeneID on list to browse an individual gene. The Gene Individual Page displays basic information of gene and its associated sample ontologies and traits. User can navigate between genes, sample ontologies and traits through links. Alternatively, users can start by browsing list of sample ontologies and traits and reach ir associated genes. 3

14 RESEARCH SUPPLEMENTARY INFORMATION 3 Use Case : Download a List of Novel e- lncrnas 2 3 Gene List Page Supplementary Fig. 9 Download a List of Novel e-lncrnas. Steps:. Go to Gene List Page by clicking Genes in Landing Page; 2. Check novel in filter Annotations to filter for genes that are not annotated in GENCODEv9; 3. Check e-lncrna in filter Category, and this should return 7,232 genes;. Choose data columns to be displayed (and thus downloaded) and click Export visible data as csv ;

15 SUPPLEMENTARY INFORMATION RESEARCH Use Case 2: Explore LncRNAs with Conserved Exons and Implicated in Gene Individual Page Gene List Page Supplementary Fig. Explore LncRNAs with Conserved Exons and Implicated in. Steps:. Go to Gene List Page by clicking Genes in Landing Page; 2. Check yes in filter Exon Conservation to filter for genes with conserved exons; 3. Check yes in filter mrna Coexpression, and this should return,673 genes;. Choose data columns to Exon conservation and mrna Coexpression be displayed; 5. Click on header of Exon conservation column to sort genes by level of conservation at ir exons; 6. Click on one of genes on top (i.e. highly conserved exon), e.g. ENSG96.; 7. It should reach individual page for gene ENSG96.; 8. Examining summary for gene, it notes ENSG96. is coexpressed with -linked ENSG26822.; 9. Examining conservation summary of gene, it notes exon of ENSG96. is conserved;. Click on extended to view ENSG96. loci;. Continue next page: in extended ZENBU view, it shows information at locus; 5

16 RESEARCH SUPPLEMENTARY INFORMATION ZENBU extended xten view ZENBU extended xten view view ZENBU extended xten SupplementaryFig. Fig. Visualizean anlncrna LncRNAImplicated Implicatedinin ininzenbu. ZENBU. Supplementary Supplementary Fig. Visualize Visualize an LncRNA Implicated in in ZENBU. Steps: Steps: Steps:. Continuing from end of Supplementary Fig., you should see ZENBU extended view for ENSG96.;.. Continuing from of of Supplementary Fig.,, you should seesee ZENBU extended view for ENSG96.; Continuing from end Supplementary Fig. should ZENBU extended view for ENSG96.; 2. Click settings and inend feature highlight search inputyou RP-973N3. (gene name of ENSG96.) to highlight 2.2. Click settings and in in feature highlight search input RP-973N3. (gene name of ENSG96.) to highlight Click settings and feature highlight search input RP-973N3. (gene name of ENSG96.) to highlight locus of RP-973N3.; locus ofofrp-973n3.; locus RP-973N3.; 3. Examine red highlighted gene and transcript of RP-973N3.. To cancel highlight, repeat step 2 and delete search 3.3. Examine red gene and transcript of RP-973N3.. To To cancel highlight, repeat stepstep 2 and delete search Examine redhighlighted highlighted gene and transcript of RP-973N3.. cancel highlight, repeat 2 and delete search string in setting dialogue box; string in setting dialogue box; string in setting dialogue box;. Click on track [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene, which shows connections.. Click [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene, which shows connections Clickonontrack track [SNP] linked pairs, SIGNIFICANT ONLY, -to-gene, between SNPs to targetlncrna-mrna coexpressed mrna, highlight regions flanking which SNPs shows (i.e. leftconnections ends of between SNPs to target coexpressed mrna, highlight regions flanking SNPs (i.e. left ends of between and to targetglass coexpressed mrna, green lines) clicksnps magnifying icon to zoom in; highlight regions flanking SNPs (i.e. left ends of green lines) click magnifying glass icon to zoom in; in; green lines)and click magnifying to SNP, zoom 5. In zoom inand view, click on track [SNP]glass GTExicon v6p, Pe-5, coding genes, and highlight SNPs, and 5.5. InIn zoom in inview, click onon track [SNP] GTEx SNP, v6p, Pe-5, coding genes, and highlight SNPs, and and zoom view, click track [SNP] GTEx SNP, v6p, Pe-5, coding genes, highlight SNPs, bring up display panel (at bottom) showing active tissues of highlighted and SNPs; bring up display panel (at bottom) showing active tissues of highlighted SNPs; bring up display panel (at (i.e. bottom) showing and active tissuesmrna of (i.e. highlighted SNPs; 6. To zoom out to view lncrna RP-973N3.) linked PLEKHG3), click on one of connections in 6.6. ToTozoom outouttotoview lncrna (i.e. RP-973N3.) andand linked mrna (i.e.(i.e. PLEKHG3), click on one of in in view lncrna (i.e. RP-973N3.) linked mrna PLEKHG3), of connections connections zoom track [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene, n click click on one range to zoom out; track [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene, n click range to zoom out; track [SNP] linked lncrna-mrna SIGNIFICANT -to-gene, click to zoom out; 7. Aview contains coexpressed lncrna mrnapairs, pair linked by ONLY, SNP, you can rearrangen tracks by range dragging title 7.7. Abars contains pairpair linked by SNP, youyou can rearrange ONLY, tracks by dragging title Aview view contains coexpressed coexpressed lncrna mrna linked by SNP, can rearrange tracks by dragging title as desired, e.g. move up lncrna mrna track [SNP] linked lncrna-mrna pairs, SIGNIFICANT -to-gene ; bars upup track [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene ; barsasasdesired, desired,e.g. e.g.move move track [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene ; 6 W W W. N A T U R E. C O M / N A T U R E 6 6

17 SUPPLEMENTARY INFORMATION RESEARCH 5 Use Case 3: Explore LncRNAs Enriched in Classical Monocytes U in Classical MM onocytes Use se CCase ase 33: : EExplore xplore LLncRNAs ncrnas EEnriched nriched in Classical onocytes Ontology List Page Ontology List Page Ontology List Page Supplementary Fig. 2 Explore LncRNAs Enriched in Classical Monocytes. Supplementary Fig. 2 Explore LncRNAs Enriched in Classical Monocytes. Supplementary Fig. 2 Explore LncRNAs Enriched in Classical Monocytes. Steps: Steps:. Go to Ontology List Page by clicking Ontologies in Landing Page;. Go to Ontology List Page by clicking Ontologies in Landing Page; 2. Type classical monocytes and press filter ; Steps: 2. Type classical monocytes and press filter ; 3. Go Click on CL:86 to enter ontology individual page for classical monocytes;. Ontology List Page by clicking Ontologies Landing Page;monocytes; 3. Clicktoon CL:86 to enter ontology individualin page for classical. Check lncrna classes in filter Gene Class to filter for lncrna genes; 2. Type classical monocytes and press filter ;. Check lncrna classes in filter Gene Class to filter for lncrna genes; 5. Click on header of Fold column to sort genes by level of enrichment in classical monocytes; 3. Click on header CL:86 enter ontology for classical monocytes; 5. of Fold tocolumn to sort genesindividual by level ofpage enrichment in classical monocytes; 6. Click on genes on top (i.e. most enriched in classical monocytes), i.e. CATG78.;. Check lncrna classes in filter Gene Class to filter for lncrna genes; 6. Click on genes on top (i.e. most enriched in classical monocytes), i.e. CATG78.; 7. Examining summary summary for gene, gene, notes CATG78. isinassociated with classical monocytes; 5. Examining Click on header of Fold for column to sort genes by level of enrichment classicalwith monocytes; 7. it itnotes CATG78. is associated classical monocytes; 8. Hover over enrichment plot to examine expression of CATG78. in individual samples of classical monocytes; 6. Click on genes on top (i.e. most enriched in classical monocytes), i.e. CATG78.; 8. Hover over enrichment plot to examine expression of CATG78. in individual samples of classical monocytes; 9. Hover over box plot to examine dynamic expression of CATG78. in classical monocytes stimulation; 7. Examining summary for gene, it notes CATG78. is associated with classical monocytes; 9. Hover over box plot to examine dynamic expression of CATG78. in classical monocytes stimulation;. Click Click on on expression view CATG Hover over enrichment to examine expression loci; ofloci; CATG78. in individual samples of classical monocytes;. expression totoplot view CATG78. Continue next page: expression ZENBU view,it expression itshows shows information at locus;monocytes stimulation;. 9. Continue Hover over page: box plot to examine dynamic ofexpression CATG78. in classical. next ininexpression ZENBU view, expression information at locus;. Click on expression to view CATG78. loci; W W W. N A T U R E. C O M / N A T U R E. Continue next page: in expression ZENBU view, it shows expression information at locus; 7 7 7

18 RESEARCH SUPPLEMENTARY INFORMATION 3 2 ZENBU expression view Supplementary Fig. 3 ZENBU Expression View of an LncRNA Enriched in Classical Monocytes. Steps:. Continuing from end of Supplementary Fig. 2, you should see ZENBU expression view for CATG78.; 2. Click on CATG78. in track [Sample Ontology Enrichment] Robust, gene (fold difference) and bring up experiment panel (at bottom) showing fold enrichment of CATG78. in classical monocytes; 3. Click on CATG78. in track [Expression Profile] Robust, gene-based expression (n=829 samples) and bring up experiment panel (at bottom) to show expression of CATG78. in,829 FANTOM5 samples;. Click on track [Epigenome] Roadmap, (n= samples) to shows tissues of roadmap are active within displayed loci; 8

19 SUPPLEMENTARY INFORMATION RESEARCH 66 Use Use Case Case : : Explore Explore Cell Cell Types Types Associated Associated with with Crohn s Crohn s Diseases Diseases Trait List Page Supplementary Fig. Explore Cell Types Associated with Crohn s Diseases. Steps:. Go to Trait List Page by clicking Ontologies in Landing Page; 2. Type crohn and click filter ; 3. Click on DOID:8778 to enter trait individual page for Crohn s Disease GWAS;. Reached trait individual page for Crohn s Disease GWAS; 5. Browse cell-types that are associated Crohn s Disease in Ontology Enrichment table, and click on filter to filter for genes that are associated with Crohn s Disease and enriched in Leukocytes, this should return 23 genes; 6. Examine composition of filtered genes, and go to individual gene page for gene based information; 9

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. Supplementary Figure Legends Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. A. Screenshot of the UCSC genome browser from normalized RNAPII and RNA-seq ChIP-seq data

More information

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Expression deviation of the genes mapped to gene-wise recurrent mutations in the TCGA breast cancer cohort (top) and the TCGA lung cancer cohort (bottom). For each gene (each pair

More information

Nature Structural & Molecular Biology: doi: /nsmb.2419

Nature Structural & Molecular Biology: doi: /nsmb.2419 Supplementary Figure 1 Mapped sequence reads and nucleosome occupancies. (a) Distribution of sequencing reads on the mouse reference genome for chromosome 14 as an example. The number of reads in a 1 Mb

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

MODULE 3: TRANSCRIPTION PART II

MODULE 3: TRANSCRIPTION PART II MODULE 3: TRANSCRIPTION PART II Lesson Plan: Title S. CATHERINE SILVER KEY, CHIYEDZA SMALL Transcription Part II: What happens to the initial (premrna) transcript made by RNA pol II? Objectives Explain

More information

Coding probability Ratio of sequences with ORF <= Size Size. Expression Levels by Class

Coding probability Ratio of sequences with ORF <= Size Size. Expression Levels by Class Supplementary Figure S Characterization of unannotated transcripts. (A) CPAT coding probability scores and (B) cummulative distribution of ORF length are plotted for each category of protein-coding genes,

More information

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

User Guide. Association analysis. Input

User Guide. Association analysis. Input User Guide TFEA.ChIP is a tool to estimate transcription factor enrichment in a set of differentially expressed genes using data from ChIP-Seq experiments performed in different tissues and conditions.

More information

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni. Supplementary Figure 1 Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Expression of Mll4 floxed alleles (16-19) in naive CD4 + T cells isolated from lymph nodes and

More information

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. Supplementary Figure 1 Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. (a,b) Values of coefficients associated with genomic features, separately

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality. Supplementary Figure 1 Assessment of sample purity and quality. (a) Hematoxylin and eosin staining of formaldehyde-fixed, paraffin-embedded sections from a human testis biopsy collected concurrently with

More information

Data mining with Ensembl Biomart. Stéphanie Le Gras

Data mining with Ensembl Biomart. Stéphanie Le Gras Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Frequency of alternative-cassette-exon engagement with the ribosome is consistent across data from multiple human cell types and from mouse stem cells. Box plots showing AS frequency

More information

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis Shisong Ma 1,2*, Michael Snyder 3, and Savithramma P Dinesh-Kumar 2* 1 School of Life Sciences, University

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities Gábor Boross,2, Katalin Orosz,2 and Illés J. Farkas 2, Department of Biological Physics,

More information

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Ondrej Podlaha 1, Subhajyoti De 2,3,4, Mithat Gonen 5, Franziska Michor 1 * 1 Department of Biostatistics

More information

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Kaifu Chen 1,2,3,4,5,10, Zhong Chen 6,10, Dayong Wu 6, Lili Zhang 7, Xueqiu Lin 1,2,8,

More information

Exercises: Differential Methylation

Exercises: Differential Methylation Exercises: Differential Methylation Version 2018-04 Exercises: Differential Methylation 2 Licence This manual is 2014-18, Simon Andrews. This manual is distributed under the creative commons Attribution-Non-Commercial-Share

More information

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory Computational aspects of ChIP-seq John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory ChIP-seq Using highthroughput sequencing to investigate DNA

More information

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford Lecture 8 Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology David K. Gifford 1 Lecture 8 RNA-seq Analysis RNA-seq principles How can we characterize mrna isoform

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock

More information

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures : RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures March 15, 2013 CLC bio Finlandsgade 10-12 8200 Aarhus N Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com support@clcbio.com

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 U1 inhibition causes a shift of RNA-seq reads from exons to introns. (a) Evidence for the high purity of 4-shU-labeled RNAs used for RNA-seq. HeLa cells transfected with control

More information

SUPPLEMENTAL INFORMATION

SUPPLEMENTAL INFORMATION SUPPLEMENTAL INFORMATION GO term analysis of differentially methylated SUMIs. GO term analysis of the 458 SUMIs with the largest differential methylation between human and chimp shows that they are more

More information

Cancer Informatics Lecture

Cancer Informatics Lecture Cancer Informatics Lecture Mayo-UIUC Computational Genomics Course June 22, 2018 Krishna Rani Kalari Ph.D. Associate Professor 2017 MFMER 3702274-1 Outline The Cancer Genome Atlas (TCGA) Genomic Data Commons

More information

High Throughput Sequence (HTS) data analysis. Lei Zhou

High Throughput Sequence (HTS) data analysis. Lei Zhou High Throughput Sequence (HTS) data analysis Lei Zhou (leizhou@ufl.edu) High Throughput Sequence (HTS) data analysis 1. Representation of HTS data. 2. Visualization of HTS data. 3. Discovering genomic

More information

The Epigenome Tools 2: ChIP-Seq and Data Analysis

The Epigenome Tools 2: ChIP-Seq and Data Analysis The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline Epigenome: basics review ChIP-seq overview

More information

MIR retrotransposon sequences provide insulators to the human genome

MIR retrotransposon sequences provide insulators to the human genome Supplementary Information: MIR retrotransposon sequences provide insulators to the human genome Jianrong Wang, Cristina Vicente-García, Davide Seruggia, Eduardo Moltó, Ana Fernandez- Miñán, Ana Neto, Elbert

More information

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles ChromHMM Tutorial Jason Ernst Assistant Professor University of California, Los Angeles Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap

More information

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, Supplementary Information Supplementary Figures Supplementary Figure 1. a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, gene ID and specifities are provided. Those highlighted

More information

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells CAMDA 2009 October 5, 2009 STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells Guohua Wang 1, Yadong Wang 1, Denan Zhang 1, Mingxiang Teng 1,2, Lang Li 2, and Yunlong Liu 2 Harbin

More information

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB CSF-NGS January 22, 214 Contents 1 Introduction 1 2 Experimental Details 1 3 Results And Discussion 1 3.1 ERCC spike ins............................................

More information

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from Supplementary Figure 1 SEER data for male and female cancer incidence from 1975 2013. (a,b) Incidence rates of oral cavity and pharynx cancer (a) and leukemia (b) are plotted, grouped by males (blue),

More information

ChIP-seq data analysis

ChIP-seq data analysis ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017 Contents Background ChIP-seq protocol ChIP-seq data analysis Transcriptional regulation Transcriptional

More information

Supplementary Information. Supplementary Figures

Supplementary Information. Supplementary Figures Supplementary Information Supplementary Figures.8 57 essential gene density 2 1.5 LTR insert frequency diversity DEL.5 DUP.5 INV.5 TRA 1 2 3 4 5 1 2 3 4 1 2 Supplementary Figure 1. Locations and minor

More information

Assignment 5: Integrative epigenomics analysis

Assignment 5: Integrative epigenomics analysis Assignment 5: Integrative epigenomics analysis Due date: Friday, 2/24 10am. Note: no late assignments will be accepted. Introduction CpG islands (CGIs) are important regulatory regions in the genome. What

More information

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Supplemental Data Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Jianfu Jiang 1, Xinna Liu 1, Guotian Liu, Chonghuih Liu*, Shaohuah Li*, and Lijun

More information

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq) RNA sequencing (RNA-seq) Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome

More information

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells. Supplementary Figure 1 Characteristics of SEs in T reg and T conv cells. (a) Patterns of indicated transcription factor-binding at SEs and surrounding regions in T reg and T conv cells. Average normalized

More information

EPIGENOMICS PROFILING SERVICES

EPIGENOMICS PROFILING SERVICES EPIGENOMICS PROFILING SERVICES Chromatin analysis DNA methylation analysis RNA-seq analysis Diagenode helps you uncover the mysteries of epigenetics PAGE 3 Integrative epigenomics analysis DNA methylation

More information

Long non-coding RNAs

Long non-coding RNAs Long non-coding RNAs Dominic Rose Bioinformatics Group, University of Freiburg Bled, Feb. 2011 Outline De novo prediction of long non-coding RNAs (lncrnas) Genome-wide RNA gene-finding Intrinsic properties

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Effect of HSP90 inhibition on expression of endogenous retroviruses. (a) Inducible shrna-mediated Hsp90 silencing in mouse ESCs. Immunoblots of total cell extract expressing the

More information

Transcript reconstruction

Transcript reconstruction Transcript reconstruction Summary I Data types, file formats and utilities Annotation: Genomic regions Genes Peaks bedtools Alignment: Map reads BAM/SAM Samtools Aggregation: Summary files Wig (UCSC) TDF

More information

EXPression ANalyzer and DisplayER

EXPression ANalyzer and DisplayER EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh

More information

Peak-calling for ChIP-seq and ATAC-seq

Peak-calling for ChIP-seq and ATAC-seq Peak-calling for ChIP-seq and ATAC-seq Shamith Samarajiwa CRUK Autumn School in Bioinformatics 2017 University of Cambridge Overview Peak-calling: identify enriched (signal) regions in ChIP-seq or ATAC-seq

More information

Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W

Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W 5.1.2007 Overview High-quality finished sequence is much more useful for research once it is annotated. Annotation is a fundamental

More information

Unsupervised Identification of Isotope-Labeled Peptides

Unsupervised Identification of Isotope-Labeled Peptides Unsupervised Identification of Isotope-Labeled Peptides Joshua E Goldford 13 and Igor GL Libourel 124 1 Biotechnology institute, University of Minnesota, Saint Paul, MN 55108 2 Department of Plant Biology,

More information

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies 2017 Contents Datasets... 2 Protein-protein interaction dataset... 2 Set of known PPIs... 3 Domain-domain interactions...

More information

Supplemental Figure 1. Small RNA size distribution from different soybean tissues.

Supplemental Figure 1. Small RNA size distribution from different soybean tissues. Supplemental Figure 1. Small RNA size distribution from different soybean tissues. The size of small RNAs was plotted versus frequency (percentage) among total sequences (A, C, E and G) or distinct sequences

More information

RNA-seq Introduction

RNA-seq Introduction RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different functional RNAs Which RNAs (and sometimes then translated

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature10866 a b 1 2 3 4 5 6 7 Match No Match 1 2 3 4 5 6 7 Turcan et al. Supplementary Fig.1 Concepts mapping H3K27 targets in EF CBX8 targets in EF H3K27 targets in ES SUZ12 targets in ES

More information

Introduction. Introduction

Introduction. Introduction Introduction We are leveraging genome sequencing data from The Cancer Genome Atlas (TCGA) to more accurately define mutated and stable genes and dysregulated metabolic pathways in solid tumors. These efforts

More information

Evolution of the unspliced transcriptome

Evolution of the unspliced transcriptome Engelhardt and Stadler BMC Evolutionary Biology (2015) 15:166 DOI 10.1186/s12862-015-0437-7 RESEARCH ARTICLE Open Access Evolution of the unspliced transcriptome Jan Engelhardt 1* and Peter F. Stadler

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1

Nature Neuroscience: doi: /nn Supplementary Figure 1 Supplementary Figure 1 Illustration of the working of network-based SVM to confidently predict a new (and now confirmed) ASD gene. Gene CTNND2 s brain network neighborhood that enabled its prediction by

More information

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b) KRas;At KRas;At KRas;At KRas;At a b Supplementary Figure 1. Gene set enrichment analyses. (a) GO gene sets (MSigDB v3. c5) enriched in KRas;Atg5 fl/+ as compared to KRas;Atg5 fl/fl tumors using gene set

More information

Chromatin marks identify critical cell-types for fine-mapping complex trait variants

Chromatin marks identify critical cell-types for fine-mapping complex trait variants Chromatin marks identify critical cell-types for fine-mapping complex trait variants Gosia Trynka 1-4 *, Cynthia Sandor 1-4 *, Buhm Han 1-4, Han Xu 5, Barbara E Stranger 1,4#, X Shirley Liu 5, and Soumya

More information

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region methylation in relation to PSS and fetal coupling. A, PSS values for participants whose placentas showed low,

More information

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017 Epigenetics Jenny van Dongen Vrije Universiteit (VU) Amsterdam j.van.dongen@vu.nl Boulder, Friday march 10, 2017 Epigenetics Epigenetics= The study of molecular mechanisms that influence the activity of

More information

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis Tieliu Shi tlshi@bio.ecnu.edu.cn The Center for bioinformatics

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 1 Abstract A stretch of chimpanzee DNA was annotated using tools including BLAST, BLAT, and Genscan. Analysis of Genscan predicted genes revealed

More information

Bioinformatics Laboratory Exercise

Bioinformatics Laboratory Exercise Bioinformatics Laboratory Exercise Biology is in the midst of the genomics revolution, the application of robotic technology to generate huge amounts of molecular biology data. Genomics has led to an explosion

More information

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets. Supplementary Figure 1 Transcriptional program of the TE and MP CD8 + T cell subsets. (a) Comparison of gene expression of TE and MP CD8 + T cell subsets by microarray. Genes that are 1.5-fold upregulated

More information

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016 Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf Rachel Schwope EPIGEN May 24-27, 2016 What does H3K4 methylation do? Plant of interest: Vitis vinifera Culturally important

More information

Section D. Identification of serotype-specific amino acid positions in DENV NS1. Objective

Section D. Identification of serotype-specific amino acid positions in DENV NS1. Objective Section D. Identification of serotype-specific amino acid positions in DENV NS1 Objective Upon completion of this exercise, you will be able to use the Virus Pathogen Resource (ViPR; http://www.viprbrc.org/)

More information

Genetics and Genomics in Medicine Chapter 6 Questions

Genetics and Genomics in Medicine Chapter 6 Questions Genetics and Genomics in Medicine Chapter 6 Questions Multiple Choice Questions Question 6.1 With respect to the interconversion between open and condensed chromatin shown below: Which of the directions

More information

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand ChIP-seq analysis J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand Tuesday : quick introduction to ChIP-seq and peak-calling (Presentation + Practical session)

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

Expanded View Figures

Expanded View Figures Solip Park & Ben Lehner Epistasis is cancer type specific Molecular Systems Biology Expanded View Figures A B G C D E F H Figure EV1. Epistatic interactions detected in a pan-cancer analysis and saturation

More information

Supplement to SCnorm: robust normalization of single-cell RNA-seq data

Supplement to SCnorm: robust normalization of single-cell RNA-seq data Supplement to SCnorm: robust normalization of single-cell RNA-seq data Supplementary Note 1: SCnorm does not require spike-ins, since we find that the performance of spike-ins in scrna-seq is often compromised,

More information

Supplemental Figure 1. Genes showing ectopic H3K9 dimethylation in this study are DNA hypermethylated in Lister et al. study.

Supplemental Figure 1. Genes showing ectopic H3K9 dimethylation in this study are DNA hypermethylated in Lister et al. study. mc mc mc mc SUP mc mc Supplemental Figure. Genes showing ectopic HK9 dimethylation in this study are DNA hypermethylated in Lister et al. study. Representative views of genes that gain HK9m marks in their

More information

To begin using the Nutrients feature, visibility of the Modules must be turned on by a MICROS Account Manager.

To begin using the Nutrients feature, visibility of the Modules must be turned on by a MICROS Account Manager. Nutrients A feature has been introduced that will manage Nutrient information for Items and Recipes in myinventory. This feature will benefit Organizations that are required to disclose Nutritional information

More information

Gene-microRNA network module analysis for ovarian cancer

Gene-microRNA network module analysis for ovarian cancer Gene-microRNA network module analysis for ovarian cancer Shuqin Zhang School of Mathematical Sciences Fudan University Oct. 4, 2016 Outline Introduction Materials and Methods Results Conclusions Introduction

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

User Guide. Protein Clpper. Statistical scoring of protease cleavage sites. 1. Introduction Protein Clpper Analysis Procedure...

User Guide. Protein Clpper. Statistical scoring of protease cleavage sites. 1. Introduction Protein Clpper Analysis Procedure... User Guide Protein Clpper Statistical scoring of protease cleavage sites Content 1. Introduction... 2 2. Protein Clpper Analysis Procedure... 3 3. Input and Output Files... 9 4. Contact Information...

More information

MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data

MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data Nucleic Acids Research Advance Access published February 1, 2012 Nucleic Acids Research, 2012, 1 13 doi:10.1093/nar/gkr1291 MATS: a Bayesian framework for flexible detection of differential alternative

More information

A Quick-Start Guide for rseqdiff

A Quick-Start Guide for rseqdiff A Quick-Start Guide for rseqdiff Yang Shi (email: shyboy@umich.edu) and Hui Jiang (email: jianghui@umich.edu) 09/05/2013 Introduction rseqdiff is an R package that can detect differential gene and isoform

More information

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016

Tutorial. ChIP Sequencing. Sample to Insight. September 15, 2016 ChIP Sequencing September 15, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com ChIP Sequencing

More information

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD, Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD, Crawford, GE, Ren, B (2007) Distinct and predictive chromatin

More information

Titrations in Cytobank

Titrations in Cytobank The Premier Platform for Single Cell Analysis (1) Titrations in Cytobank To analyze data from a titration in Cytobank, the first step is to upload your own FCS files or clone an experiment you already

More information

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22. Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.32 PCOS locus after conditioning for the lead SNP rs10993397;

More information

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Tong WW, McComb ME, Perlman DH, Huang H, O Connor PB, Costello

More information

cis-regulatory enrichment analysis in human, mouse and fly

cis-regulatory enrichment analysis in human, mouse and fly cis-regulatory enrichment analysis in human, mouse and fly Zeynep Kalender Atak, PhD Laboratory of Computational Biology VIB-KU Leuven Center for Brain & Disease Research Laboratory of Computational Biology

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA www.impactjournals.com/oncotarget/ Oncotarget, Supplementary Materials 2016 Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) DNA Supplementary Materials

More information

Guide to Use of SimulConsult s Phenome Software

Guide to Use of SimulConsult s Phenome Software Guide to Use of SimulConsult s Phenome Software Page 1 of 52 Table of contents Welcome!... 4 Introduction to a few SimulConsult conventions... 5 Colors and their meaning... 5 Contextual links... 5 Contextual

More information

Circular RNAs (circrnas) act a stable mirna sponges

Circular RNAs (circrnas) act a stable mirna sponges Circular RNAs (circrnas) act a stable mirna sponges cernas compete for mirnas Ancestal mrna (+3 UTR) Pseudogene RNA (+3 UTR homolgy region) The model holds true for all RNAs that share a mirna binding

More information

Analyse de données de séquençage haut débit

Analyse de données de séquençage haut débit Analyse de données de séquençage haut débit Vincent Lacroix Laboratoire de Biométrie et Biologie Évolutive INRIA ERABLE 9ème journée ITS 21 & 22 novembre 2017 Lyon https://its.aviesan.fr Sequencing is

More information

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first intron IGLL5 mutation depicting biallelic mutations. Red arrows highlight the presence of out of phase

More information

Protein Reports CPTAC Common Data Analysis Pipeline (CDAP)

Protein Reports CPTAC Common Data Analysis Pipeline (CDAP) Protein Reports CPTAC Common Data Analysis Pipeline (CDAP) v. 05/03/2016 Summary The purpose of this document is to describe the protein reports generated as part of the CPTAC Common Data Analysis Pipeline

More information