for the article titled An atlas of human long non- coding RNAs with accurate 5 ends TABLE OF CONTENTS
|
|
- Doris Nelson
- 5 years ago
- Views:
Transcription
1 SUPPLEMENTARY Supplementary INFORMATION Information for article titled doi:.38/nature237 An atlas of human long non- coding RNAs with accurate 5 ends by Division of Genome Technologies, RIKEN, Yokohama, Japan (27) TABLE OF CONTENTS A Supplementary Notes... 2 Rationale and Implementation of TIEScore Benchmarking Performance of TIEScore... 3 TIEScore Cutoffs and TSS Validation Rates... 6 Definition of Genes, Coding Potential and Comparison with GENCODEv Directionality, Exosome Sensitivity and Transcript Properties Sequence Features at LncRNA TSS Supported by Different Types... B Online Resources... 2 Assembly, Expression Atlas and or Resources Overview of FANTOM CAT Browser Use Case : Download a List of Novel e- lncrnas... Use Case 2: Explore LncRNAs with Conserved Exons and Implicated in Use Case 3: Explore LncRNAs Enriched in Classical Monocytes Use Case : Explore Cell Types Associated with Crohn s Diseases... 9 C Supplementary Tables... 2 CAGE Library Information RNA- seq Library Information FANTOM CAT Gene Information... 2 Directionality, Exosome Sensitivity and Transcript Properties FANTOM CAT Genes in LncRNAdb Conservation of TIR and Exon Transposons at TIR Orthologous Transcription Expression Levels and Specificity in Primary Cell Facets... 2 Sample Ontology Information... 2 Gene Association with Cell Types Trait Information Gene Association with Traits... 2 Curation of Cell type and Trait Pairs Genes Involved in Cell Type and Trait pairs linked lncrna and mrna Pairs Gene- based Functional Evidence Grouping of Samples for Differential Expression Differential Expression Results... 2 D References
2 RESEARCH SUPPLEMENTARY INFORMATION A Supplementary Notes Rationale and Implementation of TIEScore a C = CAGE cluster expression level (cpm) CAGE criterion E = exon number total exon length (kb) Distance criterion exon Exon criterion exon D = distance from CAGE cluster to transcript (nt) Transformation based on linear regression with TSS validation rate by Roadmap as in b Weighting based on benchmark performance (ROC-AUC) against gold standards as in c TIEScore as an empirical estimation of TSS validation rate by TIEScore = w e f e (E) + w d f d (D) + w c f c (C) w e =. f e (E) = 5.5 log 2 E w d =.55 f d (D) =.9 D w c =.35 f c (C) = 6.63 log 2 C b Transformation c Weighting linear scale square root scale log 2 scale increasing performance CAGE cluster TSS validation rate by (%) r 2 =.3 r 2 =.69 f e (E) = 5.5 log 2 E r 2 =.96 Exon criterion (E) TIEScore performance (ROC-AUC) Performance CAGE cluster TSS validation rate by (%) transcript TSS validation rate by (%) Exon criterion (E) in various scales f d (D) =.9 D r 2 =.89 r 2 =.98 r 2 = Distance criterion (D) in various scales f c (C) = 6.63 log 2 C r 2 =.7 r 2 =.89 r 2 = CAGE criterion (C) in various scales actual TSS permutated TSS Distance criterion (D) CAGE criterion (C) weight of f c (C) weight of f d (D) weight of f e (E) w e =. w d =.55 w c = rank of combinations (n=86) of relative weights, sorted by TIEScore performance (ROC-AUC) Exon criterion (E) Distance criterion (D) CAGE criterion (C) Supplementary Fig. Rationale and Implementation of TIEScore. a, For each pair of transcript and CAGE cluster, TIEScore is calculated as sum of values from exon (E), distance (D) and CAGE (C) criteria, which were transformed by fe (E), fd (D) and fc (C) as in b and weighted by we, wd and wc as in c, respectively. The resulting TIEScore can be interpreted as an empirical estimation of TSS validation rate by. b, Transformation of TIEScore criteria values. We investigated correlation between TIEScore criteria values and TSS validation rate by DNaseI hypersensitivity sites (). The criteria values were transformed into various scales (columns) and plotted against TSS validation rate by. A TSS is validated if it overlaps a. Within each bin of values at X- axis, percentage of validated TSS was calculated (i.e. TSS validation rate). Same number of positions was randomly sampled from unannotated genomic regions (permutated TSS). Linear regression (red line, 99.99% confidence intervals) was performed on actual TSS and r 2 was indicated. The plot with highest r 2 within each TIEScore criteria was highlighted in pink and corresponding linear regression functions (i.e. fe (E), fd (D) and fc (C)) were used to transform TIEScore criteria values. c, Weighting of TIEScore criteria values. TIEScore is defined as weighted sum of transformed TIEScore criteria values. 2
3 SUPPLEMENTARY INFORMATION RESEARCH. TIEScore Rationale: Transformation and Weighting of Criteria Values Transcription Initiation Evidence Score (TIEScore) is a custom metric that evaluates properties of a pair of CAGE cluster and transcript model to quantify likelihood that corresponding CAGE transcription start site (TSS) being genuine (Supplementary Fig. a). We reasoned lowly abundant transcripts are likely to be insufficiently covered by RNA-seq reads and thus ir transcript models are more likely to be incomplete and truncated into shorter and less spliced models. This is supported by observation that product of exon number and total exon length (kb) of transcript models (i.e. Exon criterion, E) are highly correlated with percentage of TSS validated by (i.e. validation rate, Supplementary Fig. b). Also, lowly abundant CAGE clusters that are distant from transcript 5 ends may represent degradation products of longer transcripts, previously referred to as exon painting 2. This is supported by observation that expression levels of CAGE clusters and ir distances to closest transcript model (i.e. CAGE criterion, C and Distance criterion, D) are correlated with validation rate (Supplementary Fig. b-c). We refore devised TIEScore (Supplementary Fig. a), a metric integrating se criteria to address inaccurately identified 5 ends of transcripts and build transcript models with well-supported transcription initiation evidence. First, we explored linearity of validation rate across se criteria values using various scales based on linear regression (Supplementary Fig. b). For each criterion, scale that yields highest r 2 was chosen (Supplementary Fig. b) and its value was n transformed into validation rate based on corresponding linear regression functions (i.e. f e (E), f d (D) and f c (C)) as in Supplementary Fig. b. TIEScore is defined as weighted sum of se transformed values. To optimize weights of se criteria, we evaluated performance of TIEScore for discrete combinations of weights (n=86 combinations, i.e. three weights at grid of. with sums equal to, Supplementary Fig. c). The performance of TIEScore was measured in terms of area under receiver operating characteristic (ROC) curve (ROC-AUC) 3 in 7 matched CAGE and RNA-seq libraries (see details in Supplementary Note 2) and combination of weights which yielded highest AUC-ROC was chosen (i.e. w e, w d and w c ). TIEScore is thus calculated as: w e f e (E) + w d f d (D) + w c f c (C), which can be interpreted as an empirical estimation of TSS validation rate by..2 Implementation of TIEScore in Meta-assembly of FANTOM CAT TIEScore was evaluated for each pair of CAGE clusters and transcript models within kb. Each transcript model was assigned to CAGE cluster yielding highest TIEScore and its 5 end was adjusted to most prominent TSS of CAGE cluster. Transcripts with no CAGE peaks within kb, and CAGE peaks without transcripts within kb, were refore discarded. This associated each retained transcript with a CAGE cluster, and each retained CAGE cluster with one or more transcripts. A CAGE cluster is n assigned highest TIEScore yielded from its associated transcripts. TIEScore was first applied to each of five transcript model collections separately (with,897 CAGE libraries, Supplementary Table ) and n merged into a non-redundant transcript set (referred to as raw FANTOM CAGE associated transcriptome (CAT)). Specifically, transcript models from GENCODEv9 were used as initial reference to sequentially overlay onto m transcripts from or four collections, in sequence of Human BodyMap 2., mitranscriptome 5, ENCODE 6 and FANTOM5 RNA-seq assembly (this study). In each overlay, query transcript models that share ) 5 ends within ±5bp, 2) exact same splicing junctions, 3) 3 ends within ±2,bp, with reference transcript models were discarded as redundant. The non-redundant query transcript models were n added as new reference set for next round of overlay. For each CAGE cluster, maximum TIEScore after all four rounds of overlays was taken as its TIEScore. The TIEScore of raw FANTOM CAT was benchmarked against gold standards (with N=5, see details in Supplementary Note 2). The TIEScore cutoffs were chosen as in Supplementary Note
4 RESEARCH SUPPLEMENTARY INFORMATION 2 Benchmarking Performance of TIEScore gold standard TSS regions with various degrees of chromhmm state support gold standard non-tss regions relaxed strict > >2 >3 > >5 >6 >7 >8 >9 > 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,..75 all CAGE clusters 5, fraction of regions covered by DNase-seq a relative position from peak of CAGE clusters (bp) recall RNA-seq read counts cutoff = 27. CAGE clusters with less than.86 read counts, rescued by TIEScore 89 37, log2 RNA-seq read count cutoff,66 cutoff = cutoff = FDR = log2 CAGE read count cutoff log2 RNA-seq read count cutoff. cutoff = FDR = TIEscore cutoff F-measure mean = , TIEscore cutoff.5.75 recall FDR = F-measure mean =.2. 2, TIEScore cutoff = 52.9 PR-AUC mean =.92 F-measure mean =.9 F-measure PR-AUC mean = recall. d false discovery rate c TSS identified at false discovery rate of.5 by three classifiers precision PR-AUC mean =.98 e CAGE read count RNA-seq read count,333 TIEScore.75 b log2cage read count cutoff CAGE read counts cutoff =.86 Supplementary Fig. 2 Benchmarking Performance of TIEScore. Seventy samples with matched CAGE and RNA-seq libraries were used for TIEScore benchmarking (Supplementary table ). a, coverage of gold standard TSS regions. All CAGE clusters (st column) refer to all CAGE clusters in raw FANTOM CAT. Gold standard non-tss regions (2nd column) and TSS regions (3rd column) were defined using chromatin states among 27 epigenome datasets from Roadmap Epigenomics Consortium7 and CAGE clusters identified in FANTOM58. Gold standard TSS regions were defined at various degrees of chromatin state support (from relaxed to strict, 3rd to last column). Y-axis: fraction of corresponding regions covered by DNaseseq peaks. Line and shaded area: signal summarized per window of 2bp, solid line and shaded area represent median and quartiles of 27 epigenome datasets at corresponding window. In b, c and d, benchmarking of performance of TIEScore, CAGE read count and RNA-seq read count was repeated with sets of gold standard TSS regions defined at various levels of stringency as in a with same color scale. b, Precision and recall curve (PR curve). The area under PR curve (PR-AUC)3 is used as a measure of classifier performance in identifying genuine TSS. PR-AUC3 of TIEScore is significantly higher than that of or 2 classifiers across various stringencies of gold standard TSS definition (P<.5, paired Student s t-test). In c and d, vertical dashed lines: classifier cutoffs with false discovery rate (FDR) of.5 at gold standard TSS>5 chromatin state support. c, F-measure versus classifier cutoffs. F-measure3 was calculated as harmonic mean of precision and recall and is used as a measure of classifier accuracy in identifying genuine TSS. At FDR of.5 (vertical dashed lines), TIEScore outperforms (higher F-measure values) or two classifiers. d, FDR versus classifier cutoffs. Intersections of curve and horizontal dashed lines refer to classifier cutoffs for achieving FDR of.5 (vertical dashed lines). e, Overlap of TSS identified by three classifiers at cutoffs for achieving FDR of.5. W W W. N A T U R E. C O M / N A T U R E
5 SUPPLEMENTARY INFORMATION RESEARCH 2. Definition of gold standard TSS and non-tss regions To assess how well TIEScore identifies genuine TSS compared to using CAGE or RNA-seq derived information alone, we first defined sets of gold standard TSS and non-tss regions using epigenome datasets from Roadmap Epigenomics Consortium 7 ( Firstly, chromatin states TssA (Active TSS), Enh (Enhancers), TssBiv (Bivalent/Poised TSS) and EnhBiv (Bivalent Enhancer) were referred to as pro-tss-states ; and chromatin states Tx (Strong transcription), Het (Heterochromatin) and Quies (Quiescent/Low) were referred to as non-tss-states. A CAGE cluster was defined as a gold standard TSS region when its prominent TSS overlaps a pro-tss-state in N of 27 epigenomic datasets 7 and overlaps no non- TSS-states, where N reflects stringency of gold standard definition. Conversely, a CAGE cluster was defined as a gold standard non-tss region when its prominent TSS overlaps a non-tss state in at least 2 of 27 epigenome datasets 7 and overlaps no pro-tss-states. These gold standard regions were n used to assess TIEScore performance in distinguishing TSS from non-tss regions (Supplementary Fig. 2a). 2.2 Performance of TIEScore versus CAGE or RNA-seq data alone Using 7 samples with matched CAGE and RNA-seq libraries (Supplementary Table 2), performance of TIEScore in identifying genuine TSS was compared against CAGE or RNA-seq data alone, in identification of genuine TSS (Supplementary Fig. 2). First, TIEScore was applied to a subset of CAGE clusters with at least 3 reads (sum among 7 samples) and FANTOM5 RNA-seq assembly. A set of universal TSS regions (n=,,) was n generated across three sets of TSS (i.e. combined, and CAGE or RNA-seq alone) by merging transcripts 5 ends in FANTOM5 RNA-seq assembly within ±nt into RNA-seq TSS regions and n by merging se RNA-seq TSS regions with CAGE clusters within ±nt. Gold standard TSS and non-tss regions for se universal TSS regions were defined using midpoint of se universal TSS regions instead of prominent TSS in CAGE clusters. Ten sets of gold standard TSS regions were defined at various stringencies as described above, with N= to, at step of. For each of universal TSS regions, its support (i.e. score) from each of three datasets was n calculated as, ) combined: maximum TIEScore of all associated CAGE clusters; 2) CAGE only: maximum number of CAGE reads of all associated CAGE clusters; 3) RNA-seq only: maximum number of RNA-seq reads of all associated RNA-seq TSS regions. (Note: each RNA-seq TSS region is represented by sum of RNA-seq reads of its associated transcripts as estimated in Sailfish 2 ). The score performance for each of three TSS datasets was assessed using R packages ROCR 3 and PRROC 3. Specifically, sensitivity, specificity, precision, recall, FDR and F-measure at various score cutoffs were calculated using ROCR 3, and ROC-AUC and PR-AUC were calculated using PRROC 3. In terms of precision, recall and F-measure (Supplementary Fig. 2b-c), TIEScore substantially outperformed both CAGE only and RNA-seq only based approaches. In Supplementary Fig. 2d, cutoffs values of various classifiers at FDR of.5 were indicated. In Supplementary Fig. 2e, RNA-seq read count identified appreciably fewer TSS than or two classifiers, due to its high cutoffs (i.e reads) to achieve FDR of.5. Number of TSS identified based on TIEScore and CAGE read count is comparable. TIEScore rescued 35,83 TSS discarded by CAGE read count classifier cutoff at reads, implying its ability to identify lowly expressed TSS with acceptable FDR by synergizing information from CAGE and RNA-seq data. 5
6 RESEARCH SUPPLEMENTARY INFORMATION 3 TIEScore Cutoffs and TSS Validation Rates a c sensitivity cumulative percentage of TSS (%) % specificity=.83 robust TIEScore cutoff 53.% 8.6% sensitivity=.82 ROC-AUC= specificity b d accuracy or false discovery rate TSS validation rate by (%) permissive (35.3) FDR=.23 robust (5.) FDR=.77 stringent (6.) FDR=.26 accuracy FDR TIEScore actual TSS permutated TSS Pearson s r.9 e TSS validation rate by RAMPAGE (%) validated by RAMPAGE within X bp X = Pearson s r mean = TIEScore TIEScore TIEScore Supplementary Fig. 3 TIEScore Cutoffs and TSS Validation Rates. a. Robust cutoff of TIEScore. The robust TIEScore cutoff (TIEScore=5.) was based on optimal cutoff determined from a ROC curve, with FDR=.77. b, Permissive and stringent cuttofs of TIEScore. c, Cumulative percentage of TSS. % refers all TSS in raw FANTOM CAT. d, FANTOM CAT TSS validation rate by. Percentages of TSS at various TIEScore (bin=.) overlaps with were calculated. Permutated TSS refers to randomly sampled genomic positions as control. e, TSS validation by RAMPAGE. The percentages of TSS at various TIEScore (bin=.) that could be validated by RAMPAGE at various ranges were calculated and loess-smood curves were plotted. In c-e, vertical dashed lines represent three TIEScore cutoffs as in b. 3. Definition of TIEScore cutoffs Based on a ROC curve (Supplementary Fig. 3a), we derived an optimal TIEScore cutoff (TIEScore=5., referred to as robust cutoff), with ROC-AUC=.89 and FDR=.77. We also defined two additional values (Supplementary Fig. 3b), permissive (TIEScore=35.3) and stringent (TIEScore=6.) cutoffs, based on FDR chosen as three times more relaxed (FDR=.23) and strict (FDR=.26) than robust FDR. 3.2 Properties at various TIEScore cutoffs At robust cutoff, 6.9% of all TSS were retained (Supplementary Fig. 3c). We observed high correlations between TIEScore and TSS validation rate by (Pearson s r=.9, Methods, Supplementary Fig. 3d) and by RAMPAGE (mean Pearson s r=.95, Methods, Supplementary Fig. 3e), suggesting that TIEScore quantitatively identifies genuine TSS. 6
7 SUPPLEMENTARY INFORMATION RESEARCH Definition of Genes, Coding Potential and Comparison with GENCODEv a support in classes promoter enhancer dyadic no Gene classes uncertain coding Number of genes in classes lncrnas mrnas ors Levels of support in FANTOM CAT CAT genes, above stringent cutoff CAT genes, between robust and stringent cutoffs CAT genes, between permissive and robust cutoffs b c Stringent cutoff 3,52 genes Robust cutoff 59, genes Permissive cutoff 2,25 genes 5 75 percentage of genes (%) 5 75 percentage of genes (%) structural RNAs small RNAs short ncrnas sense overlap RNAs pseudogenes lncrna, sense intronic lncrna, intergenic lncrna, divergent lncrna, antisense protein coding mrnas uncertain coding structural RNAs small RNAs short ncrnas sense overlap RNAs pseudogenes lncrna, sense intronic lncrna, intergenic lncrna, divergent lncrna, antisense protein coding mrnas uncertain coding structural RNAs small RNAs short ncrnas sense overlap RNAs pseudogenes lncrna, sense intronic lncrna, intergenic lncrna, divergent lncrna, antisense 5,, 5, 2,, 3, 35, number of genes (count) 5,, 5, 2,, 3, 35, number of genes (count) d f number of GENCODEv genes (count) 5,, 5, 2, total bp bp 5 bp bp bp density.5.5 below permissive cutoff lncrnas mrnas pseudogenes GENCODEv gene classes e number of novel FANTOM CAT genes (count), 2, 3,, 5, lncrnas mrnas ors FANTOM CAT gene classes GENCODEv transcripts with 5 end revised by FANTOM CAT lncrna protein coding pseudogene 26,578 % 5,926 % 7,66 % 6,9 6% 28,27 88% 8, 6% 2,82 8% 93,788 6% 7,376 2% 7,6 27% 5,33 3%,867 28% 5,9 9% 29,73 2% 3,56 2% 3,33 %,963 %,97 % mean 35.9 mean 95. mean 8.7 protein coding mrnas 5 75 percentage of genes (%) 5,, 5, 2,, 3, 35, number of genes (count) relative modified 5 end position (bp) Supplementary Fig. Definition of FANTOM CAT Genes and Comparison with GENCODEv. In a-c, Left column: Percentage of genes within each gene class supported by different types of Roadmap regulatory regions 7, based on overlapping of ir strongest TSS with Roadmap 7. At all cutoffs, majority of protein coding mrnas are supported by promoter. Right column: Number of genes within each gene class. a, b and c, refer to FANTOM CAT genes at stringent, robust and permissive TIEScore cutoff respectively. d, GENCODEv genes in FANTOM CAT. The number of GENCODEv lncrna, mrna and pseudogene genes supported at various levels of FANTOM CAT was plotted. e, FANTOM CAT genes novel to GENCODEv. A FANTOM CAT gene is novel to GENCODEv if none of its transcripts is compatible with a GENCODEv transcript. f, Revision of 5 ends of GENCODEv transcripts. Upper: Number of GENCODEv transcripts revised by various extents. Total refers to total number of GENCODEv transcripts. X bp refers to number of GENCODEv transcripts revised by at least X bp. Lower: Distributions of 5 end of GENCODEv lncrna, mrna and pseudogene transcripts relative to ir compatible FANTOM CAT transcripts (i.e. relative modified 5 end position) were plotted. The absolute mean within each class was indicated.. Definition FANTOM CAT genes FANTOM CAT genes were defined based on clustering of transcript models in raw FANTOM CAT (after reducing complexity) using a custom perl script. Specifically, non-gencodev9 transcripts with exon boundaries within ±5bp to that of a GENCODEv9 transcript were first assigned to corresponding GENECODEv9 genes. Single exon non-gencodev9 transcripts within ±5bp of a GENCODEv9 exon were also assigned to corresponding GENECODEv9 gene. The non-gencodev9 transcripts spanning multiple GENECODEv9 genes were defined as chimeric transcripts and removed (as likely assembly artifacts). Remaining unassigned non-gencodev9 transcripts with exon boundaries within ±5bp were n recursively clustered, and all se resulting transcript clusters were referred to as novel genes outside GENCODEv9. Applying various TIEScore cutoffs to raw FANTOM CAT, we defined 2,25 permissive, 59, robust and 3,52 stringent genes (Supplementary Fig. a-c). Gene classes of 7
8 RESEARCH SUPPLEMENTARY INFORMATION FANTOM CAT genes assigned to GENCODEv9 protein coding genes, pseudogenes and small RNA genes were directly inherited from ir biotypes. Or gene classes in FANTOM CAT were defined as follows..2 Definition of gene classes Coding potential of all transcripts in raw FANTOM CAT was evaluated using Coding Potential Assessment Tool (CPAT, version ) with default parameters on hg9. Transcripts with CPAT score <.36 and no open reading frames (ORF) 3nt (based on getorf 6 ) are defined as non-coding. A FANTOM CAT gene is defined as non-coding if all its transcripts are non-coding, or its GENCODEv9 biotype is annotated as non-coding 7. A gene is defined as coding when at least 5% of its transcripts are coding and re is at least one transcript with an ORF 3nt. Orwise gene is classified as coding uncertain. Non-coding genes generating at least one transcript with total exonic length 2nt are defined as lncrna genes, and those <2nt are defined as short ncrna. LncRNAs are n classified in ascending order as follows: divergent lncrnas, sense intronic lncrnas, antisense lncrnas and intergenic lncrnas. Divergent lncrnas: genes with its strongest CAGE cluster within ±2kb on opposite strand of any CAGE clusters of GENCODEv9 protein coding genes or pseudogenes. Sense intronic lncrna: lncrna genes ) initiating within intron of anor FANTOM CAT gene, 2) with at least 5% of ir genic region overlapping with genic region of any anor genes, 3) with its strongest CAGE cluster not overlapping exons of or genes, and ) containing CAGE reads, or orwise defined as or sense overlap RNA. Antisense lncrnas: genes with 5% of ir genic region overlapping with genic region of GENCODEv9 protein coding genes or pseudogenes on opposite strand. Intergenic lncrnas: remaining lncrna genes that could not be assigned to any of above categories..3 Comparison between FANTOM CAT and GENCODEv A GENCODEv gene is supported by FANTOM CAT if at least for one of its transcripts is compatible with a FANTOM CAT transcript. A GENCODEv transcript is defined as compatible with FANTOM CAT transcripts if all of its exon boundaries are within ±5bp to that of a FANTOM CAT transcript, and vice versa. A FANTOM CAT gene is defined as novel if none of its transcripts is compatible to GENCODEv transcripts. About 33.86% (5,3 of,8) of GENCODEv lncrna genes (grey stack, Supplementary Fig. d) are below permissive TIEScore cutoff. At robust cutoff, FANTOM CAT covered 8,269 of 2,32 protein-coding genes in GENCODEv (Supplementary Fig. d). The robust FANTOM CAT covered 6,99 of,8 lncrna genes in GENCODEv (Supplementary Fig. d) and added 9,723 novel lncrna genes outside of GENCODEv (Supplementary Fig. e). In addition, 5 ends of 6,9 lncrna transcripts in GENCODEv were revised using FANTOM CAT models, which resulted in ir improved accuracy by a mean of 35.9bp (Supplementary Fig. f).. Annotation of open reading frames in FANTOM CAT. As some previously annotated putative lncrnas transcripts have been shown to associate with ribosomes and thus may encode for short peptides 8, we furr annotated coding potential of open reading frames, but were not used for defining coding status of a FANTOM CAT gene. Coordinates of ORFs on all FANTOM CAT transcripts (n=86,58) were extracted using getorf 6, with minimum 3nt requiring both start and stop codons. Only top 5 longest ORFs per transcript were retained, which yielded 3,6,858 non-redundant ORFs. Coordinates of ORFs were converted from transcript level to genomic level. A pre-computed 6-way whole genome alignment 9 of hg9 was downloaded from UCSC Genome Browser 2. Only 27 species were retained in alignment based on its intersection with 58-mammal phylogeny used in PhyloCSF 2, including Alpaca, Armadillo, Baboon, Bushbaby, Cat, Chimp, Cow, Dog, Dolphin, Elephant, Gorilla, Guinea pig, Hedgehog, Horse, Human, Marmoset, Megabat, Microbat, Mouse, Orangutan, Pika, Rabbit, Rat, Rhesus, Shrew, Squirrel and Tenrec. Interspecies alignments of ORF were extracted using mafsinregion from UCSC Genome Browser 2. Coding potential of se ORFs were assessed using PhyloCSF 2 and RNAcode 22 using default settings, based on phylogenetic information from interspecies alignments. We furr identified potentially translated small ORFs (sorf) based on ribosome profiling data in sorfs.org 23. An ORF is defined as coding if its score in PhyloCSF 2, P<. in RNAcode 22 or it matches an sorf with good floss-score as defined in sorfs.org 23. The annotations of all ORFs in FANTOM CAT are available at Note: Of 27,99 lncrnas genes in robust FANTOM CAT, 7,722 of m were annotated as lncrnas in GENCODEv9 2 and remaining 2,97 non-gencodev9 lncrna genes where all ir transcripts lacked coding potential (CPAT score <.36 and no ORF 3nt). Of se 2,97 non-gencodev9 lncrna genes, only,87 had putative ORFs with additional evidence of coding potential (in eir PhyloCSF 2, RNAcode 22 or sorfs.org 23 ). We provide se annotations on web resource ( yet we still consider se as genuine lncrnas for furr analysis. 8
9 5 Directionality, Exosome Sensitivity and Transcript Properties SUPPLEMENTARY INFORMATION RESEARCH 5 Directionality, Exosome Sensitivity and Transcript Properties a a directionality = (read ss read as ) / (read ss + read as ) directionality = (read ss read as ) / (read ss + read as ) sense CAGE sense TSS CAGE (read TSS antisense ss ) (read antisense ss ) CAGE TSS CAGE (read as TSS ) (read as ) CAGE cluster miscellaneous CAGE cluster miscellaneous in question in question region for read counting region for read counting distance from sense TSS (bp) distance from sense TSS (bp) b b c c d d e percentage of genes of genes exosome sensitivity genomic span span (log2 (log2 kb) kb) splicing index mrna divergent p-lncrna intergenic p-lncrna e-lncrna mrna divergent p-lncrna intergenic p-lncrna e-lncrna directionality (strand-bias (strand-bias of CAGE of CAGE signal) signal) Supplementary Fig. 55 Directionality, Exosome Sensitivity and Transcript Properties. a, a, Definition of of directionality. Miscellaneous refers to to any antisense TSS (if any). Directionality of a CAGE cluster is is defined as: as: (readss readas)/(readss+readas), where readss and readas are number of CAGE read counts (across,897 FANTOM5 samples) within 8 8 to +2nt to +2nt of of its its prominent TSS on on sense and antisense strand, respectively. b percentages of of genes within each each category for for each each bin bin of of directionality. c, c, Exosome sensitivity is measured as relative fraction of of CAGE signal observed after after exosome knockdown in in HeLa-S3 cells cells. d,. d, genomic span, defined as length between 5 5 most to to 3 3 most base of of a a gene. gene. e, e, splicing index, index, calculated as as average of of sum of of ratio of of junction spanning reads to to spliced reads of of each of of its its introns across across 7 7 RNA-seq RNA-seq libraries. libraries. The The box box plots plots show show median, quartiles and Tukey whiskers of measurements for for genes genes at at each each bin bin of of directionality. 5. Rationale Transcription initiation is intrinsically bidirectional 26 (Supplementary Fig. 5a). Functionally distinct RNA species were previously categorized by ir transcriptional directionality (strand-bias of transcription initiation, referred to as directionality) and by exosome-sensitivity (sensitivity to ribonucleolytic RNA exosome complex). For each lncrna category we examined relationship between se features and length and splicing of observed transcripts (Supplementary Fig. 5b-e). 5.2 Calculation of directionality, splicing index, genomic span and exosome sensitivity Directionality of a given CAGE cluster is measured as bias of CAGE signal on sense versus antisense strand (relative to TSS), as described with modifications as follows. A zero value implies perfectly balanced CAGE signal on both strands, and values of + and implies strongly biased CAGE signal towards sense or antisense strand, respectively. Directionality of a CAGE cluster is defined as follows: (read SS read AS )/(read SS +read AS ), where read SS and read AS are number of read counts from all FANTOM5 CAGE datasets (n=,897, Supplementary Table ) within 8 to +2nt of its prominent TSS on sense and antisense strand, respectively (Supplementary Fig. 5a). Directionality of a gene is defined by directionality of its strongest CAGE cluster. Splicing index of a gene is defined as average of sum of ratio of junction spanning reads to spliced reads of each of its introns across 7 RNA-seq libraries (libraries mentioned in Methods). Genomic span of a gene is defined as length between its 5 most to 3 most genomic location. Exosome sensitivity of a CAGE cluster is measured as relative fraction of CAGE signal observed after exosome knockdown in HeLa-S3 cells as previously described. 5.2 Directionality p-lncrnas Consistent with previous findings, transcription initiation regions (TIRs) that are predominantly transcribed from sense strand (directionality +, in all four categories, Supplementary Fig. 5b) produce less exosome-sensitive (Supplementary Fig. 5c), generally longer (Supplementary Fig. 5d) and more spliced and RNAs (Supplementary Fig. 5e). Therefore, directionality of TIR somewhat reflects properties of its produced RNAs. As expected, TIRs of mrnas predominantly generate sense transcripts (directionality +, red) and are only mildly exosome sensitive, 9 9
10 RESEARCH SUPPLEMENTARY INFORMATION whereas p-lncrnas generated from divergent antisense transcription from se mrna promoters are PROMPT 27 like (directionality, purple), largely exosome sensitive, short and rarely spliced. In contrast intergenic p-lncrna TIRs are mostly biased towards generating sense transcripts (directionality >, blue). In addition, ~% of intergenic p- lncrnas predominantly generating sense transcripts (directionality +, blue), such as MALAT 28, are relatively resistant to exosome degradation. 5.3 Directionality e-lncrnas About 76% of e-lncrnas arose from balanced bidirectional TIRs (directionality.8 and +.8, green), while about 22% were generated from unidirectional TIRs (directionality >+.8, green). These unidirectional TIR derived e- lncrnas are generally less exosome sensitive, longer, more spliced (directionality >+.8, green). This study refore expand our previous catalog of bidirectionally transcribed enhancer regions 8 by including unidirectional e-lncrnas. These contain previously identified functional e-lncrna such as CCAT, an unidirectionally transcribed from a superenhancer, which promotes long-range chromatin looping and regulates MYC transcription 29.
11 SUPPLEMENTARY INFORMATION RESEARCH 6 Sequence Features at LncRNA TSS Supported by Different Types a lncrna, sense intronc lncrna, intergenic lncrna, divergent lncrna, antisense random controls n=6 n=,26 n=26 n=2, n=,8 n=6, n=,7 n=,33 n=522 n=936 n=5,827 n=92 n=26 n=,376 n=22 n=,8 n=, n=, dyadic enhancer promoter no dyadic enhancer promoter no dyadic enhancer promoter no dyadic enhancer promoter no unannot. regions whole genome 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2, ,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 2, ,5 2, ,5 2,5 2,5 2,5 2,5 2,5 2,5 2,5 TATA box (fraction) INR motif (fraction) ,5 2,5 2,5 2, SS (fraction) PAS (fraction) CpG island (fraction) RS score (mean) H3Kme3 (fraction)..75 H3Kme (fraction) H3K27ac (fraction)..75 DNase (fraction) b relative position from TSS (bp) c relative position from TSS (bp) relative position from TSS (bp) Supplementary Fig. 6 Sequence Features at LncRNA TSS Supported by Different Types. Black dashed lines: whole genome background. Solid lines: TSS of lncrna genes and control regions randomly sampled from whole (whole genome) or unannotated portion of genome (unannot. regions). Colors of lines correspond to type of support as indicated on top. a, Epigenomic features surrounding TSS. Distributions of Roadmap DNase-seq and ChIP-seq (Methods) were plotted. b, Genomic features surrounding TSS. Distributions of rejected substitution (RS) score 3, CpG island, polyadenylation site signal (PAS) and 5 splicing site (5 SS), were plotted. c, Core promoter motifs. Distributions of TATA box and initiator (INR) motif, were plotted.. 6. Genuineness TSS of lncrna gene classes supported by different types Despite relatively weaker selective constraints of e-lncrnas and intergenic p-lncrnas TIR as measured using RS scores 3, we observed an enrichment of sequence features associated with regulated transcription initiation, including TATA-box, INR and CpG island (absent for e-lncrnas) as well as signals conducive to generating long transcripts including 5 SS enrichment and PAS depletion. These sequence features suggest that at least a subset of se TIR have undergone selection for both transcription initiation and elongation. For remaining lncrnas lacking support (grey), we still observed noticeable enrichment of 5 SS, INR and TATA-box motifs and depletion of PAS, suggesting that a subset of se TSS is likely to be genuine despite lack of support.
12 RESEARCH SUPPLEMENTARY INFORMATION B Online Resources All resources mentioned below are available at Assembly, Expression Atlas and or Resources. Assembly: Cutoffs and identifiers We provide The FANTOM CAT assembly at cutoffs (as *.gtf), Raw: without TIEScore cutoff, FDR=.285, contains % of all TSS Permissive: TIEScore 35.3, FDR=.23, contains 88.6% of all TSS Robust: [optimal, recommend to use] TIEScore 5., FDR=.77, contains 6.9% of all TSS Stringent: TIEScore 6., FDR=.26, contains 9.% of all TSS At each of TIEScore cutoffs, assembly consists of 3 types of basic items (as *.bed), ) genes, 2) transcripts and 3) CAGE clusters. Each of se items has a unique identifier and associates to items of or types as specified in ID mapping tables (as *.txt): ONE transcript associates with ONLY ONE CAGE cluster; ONE transcript associates with ONLY ONE gene; ONE CAGE cluster can associate with multiple transcripts; ONE CAGE cluster can refore associate with multiple genes; Gene identifiers (mainly inherited from GENCODEv9): ENSG*.*: inherited from GENCODEv9, e.g. ENSG577.9; CATG*.*: novel genes from FANTOM CAT, e.g. CATG557.; Transcript identifiers (mainly inherited from GENCODEv9): ENST*.*: inherited from GENCODEv9 7, e.g. ENST57292.; ENCT*.*: novel transcripts from ENCODE 6 transcript models, e.g. ENCT868.; HBMT*.*: novel transcripts from Human BodyMap v2. transcript models, e.g. HBMT386.; MICT *.*: novel transcripts from mitranscriptome 5 transcript models, e.g. MICT969.; FTMT*.*: novel transcripts from FANTOM5 RNA-seq transcript models, e.g. FTMT96.; CAGE Cluster identifiers (ALL inherited from FANTOM5 DPI CAGE Clusters ): chr*:*..*,*: as chromosome:start..end,strand, inherited from FANTOM5, e.g. chr: ,+;.2 Expression Atlas At each of TIEScore cutoffs, we provide following expression tables (,897 FANTOM5 CAGE libraries): tag count per CAGE cluster: CAGE tags count within flanking region (±5nt) of CAGE cluster; tag count per gene: sum of CAGE tags of CAGE clusters associated with gene; rle cpm per CAGE cluster: rle normalized (by edger 3 ) cpm (count per millions) CAGE cluster; rle cpm per gene: sum of CAGE tags of CAGE clusters associated with gene;.3 Or Resources We also provide following resources for FANTOM CAT as robust TIEScore cutoff: Differential expression: differential expression analysis (edger 3 ) of manually curated series; Sample ontology association: gene-based association with FANTOM5 sample ontology terms; Trait association: Gene-based association with traits in GWASdb 32 and PICS 33 ; Expression specificity: Gene-based expression levels and specificity in primary cell facets; Co-expression: Co-expression of lncrna mrna pairs linked by GTEx 3 -associated SNPs; Selective constraints: RS score and GERP elements 3 at TIR and exon; Orthologous TSS activities: Orthologous TSS activities in mouse, rat, dog and chicken; ORF Annotations: coding potential of all ORF based on phylocsf 2, RNACode 22 and sorf.org 23 ; Transposable elements: TIR association with transposable elements; Directionality: Transcriptional directionality of TIR based on,897 FANTOM5 CAGE libraries; Exosome Sensitivity: Exosome sensitivity based on exosome knockdown in HeLa cells ; Splicing Index: Gene-based splicing index in 7 RNA-seq libraries; 2
13 SUPPLEMENTARY INFORMATION RESEARCH 2 Overview of FANTOM CAT Browser Supplementary Fig. 7 Landing Page of FANTOM CAT Browser. The FANTOM CAT Browser is hosted at This web application allows users to browse FANTOM CAT genes, view ir genomic loci through ZENBU 35, filter by ir annotations, intersect m with ir associated sample ontologies or traits, and download relevant data. Alternatively, users can also browse by trait or sample ontologies, and intersect with ir associated genes. Click on circles below and start browsing Genes, Traits and Ontologies. Gene List Page Gene Individual Page Supplementary Fig. 8 Browsing Genes from Gene List and Individual Page. On Gene List Page ( users can select for genes based on filters on left if list. Click on GeneID on list to browse an individual gene. The Gene Individual Page displays basic information of gene and its associated sample ontologies and traits. User can navigate between genes, sample ontologies and traits through links. Alternatively, users can start by browsing list of sample ontologies and traits and reach ir associated genes. 3
14 RESEARCH SUPPLEMENTARY INFORMATION 3 Use Case : Download a List of Novel e- lncrnas 2 3 Gene List Page Supplementary Fig. 9 Download a List of Novel e-lncrnas. Steps:. Go to Gene List Page by clicking Genes in Landing Page; 2. Check novel in filter Annotations to filter for genes that are not annotated in GENCODEv9; 3. Check e-lncrna in filter Category, and this should return 7,232 genes;. Choose data columns to be displayed (and thus downloaded) and click Export visible data as csv ;
15 SUPPLEMENTARY INFORMATION RESEARCH Use Case 2: Explore LncRNAs with Conserved Exons and Implicated in Gene Individual Page Gene List Page Supplementary Fig. Explore LncRNAs with Conserved Exons and Implicated in. Steps:. Go to Gene List Page by clicking Genes in Landing Page; 2. Check yes in filter Exon Conservation to filter for genes with conserved exons; 3. Check yes in filter mrna Coexpression, and this should return,673 genes;. Choose data columns to Exon conservation and mrna Coexpression be displayed; 5. Click on header of Exon conservation column to sort genes by level of conservation at ir exons; 6. Click on one of genes on top (i.e. highly conserved exon), e.g. ENSG96.; 7. It should reach individual page for gene ENSG96.; 8. Examining summary for gene, it notes ENSG96. is coexpressed with -linked ENSG26822.; 9. Examining conservation summary of gene, it notes exon of ENSG96. is conserved;. Click on extended to view ENSG96. loci;. Continue next page: in extended ZENBU view, it shows information at locus; 5
16 RESEARCH SUPPLEMENTARY INFORMATION ZENBU extended xten view ZENBU extended xten view view ZENBU extended xten SupplementaryFig. Fig. Visualizean anlncrna LncRNAImplicated Implicatedinin ininzenbu. ZENBU. Supplementary Supplementary Fig. Visualize Visualize an LncRNA Implicated in in ZENBU. Steps: Steps: Steps:. Continuing from end of Supplementary Fig., you should see ZENBU extended view for ENSG96.;.. Continuing from of of Supplementary Fig.,, you should seesee ZENBU extended view for ENSG96.; Continuing from end Supplementary Fig. should ZENBU extended view for ENSG96.; 2. Click settings and inend feature highlight search inputyou RP-973N3. (gene name of ENSG96.) to highlight 2.2. Click settings and in in feature highlight search input RP-973N3. (gene name of ENSG96.) to highlight Click settings and feature highlight search input RP-973N3. (gene name of ENSG96.) to highlight locus of RP-973N3.; locus ofofrp-973n3.; locus RP-973N3.; 3. Examine red highlighted gene and transcript of RP-973N3.. To cancel highlight, repeat step 2 and delete search 3.3. Examine red gene and transcript of RP-973N3.. To To cancel highlight, repeat stepstep 2 and delete search Examine redhighlighted highlighted gene and transcript of RP-973N3.. cancel highlight, repeat 2 and delete search string in setting dialogue box; string in setting dialogue box; string in setting dialogue box;. Click on track [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene, which shows connections.. Click [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene, which shows connections Clickonontrack track [SNP] linked pairs, SIGNIFICANT ONLY, -to-gene, between SNPs to targetlncrna-mrna coexpressed mrna, highlight regions flanking which SNPs shows (i.e. leftconnections ends of between SNPs to target coexpressed mrna, highlight regions flanking SNPs (i.e. left ends of between and to targetglass coexpressed mrna, green lines) clicksnps magnifying icon to zoom in; highlight regions flanking SNPs (i.e. left ends of green lines) click magnifying glass icon to zoom in; in; green lines)and click magnifying to SNP, zoom 5. In zoom inand view, click on track [SNP]glass GTExicon v6p, Pe-5, coding genes, and highlight SNPs, and 5.5. InIn zoom in inview, click onon track [SNP] GTEx SNP, v6p, Pe-5, coding genes, and highlight SNPs, and and zoom view, click track [SNP] GTEx SNP, v6p, Pe-5, coding genes, highlight SNPs, bring up display panel (at bottom) showing active tissues of highlighted and SNPs; bring up display panel (at bottom) showing active tissues of highlighted SNPs; bring up display panel (at (i.e. bottom) showing and active tissuesmrna of (i.e. highlighted SNPs; 6. To zoom out to view lncrna RP-973N3.) linked PLEKHG3), click on one of connections in 6.6. ToTozoom outouttotoview lncrna (i.e. RP-973N3.) andand linked mrna (i.e.(i.e. PLEKHG3), click on one of in in view lncrna (i.e. RP-973N3.) linked mrna PLEKHG3), of connections connections zoom track [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene, n click click on one range to zoom out; track [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene, n click range to zoom out; track [SNP] linked lncrna-mrna SIGNIFICANT -to-gene, click to zoom out; 7. Aview contains coexpressed lncrna mrnapairs, pair linked by ONLY, SNP, you can rearrangen tracks by range dragging title 7.7. Abars contains pairpair linked by SNP, youyou can rearrange ONLY, tracks by dragging title Aview view contains coexpressed coexpressed lncrna mrna linked by SNP, can rearrange tracks by dragging title as desired, e.g. move up lncrna mrna track [SNP] linked lncrna-mrna pairs, SIGNIFICANT -to-gene ; bars upup track [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene ; barsasasdesired, desired,e.g. e.g.move move track [SNP] linked lncrna-mrna pairs, SIGNIFICANT ONLY, -to-gene ; 6 W W W. N A T U R E. C O M / N A T U R E 6 6
17 SUPPLEMENTARY INFORMATION RESEARCH 5 Use Case 3: Explore LncRNAs Enriched in Classical Monocytes U in Classical MM onocytes Use se CCase ase 33: : EExplore xplore LLncRNAs ncrnas EEnriched nriched in Classical onocytes Ontology List Page Ontology List Page Ontology List Page Supplementary Fig. 2 Explore LncRNAs Enriched in Classical Monocytes. Supplementary Fig. 2 Explore LncRNAs Enriched in Classical Monocytes. Supplementary Fig. 2 Explore LncRNAs Enriched in Classical Monocytes. Steps: Steps:. Go to Ontology List Page by clicking Ontologies in Landing Page;. Go to Ontology List Page by clicking Ontologies in Landing Page; 2. Type classical monocytes and press filter ; Steps: 2. Type classical monocytes and press filter ; 3. Go Click on CL:86 to enter ontology individual page for classical monocytes;. Ontology List Page by clicking Ontologies Landing Page;monocytes; 3. Clicktoon CL:86 to enter ontology individualin page for classical. Check lncrna classes in filter Gene Class to filter for lncrna genes; 2. Type classical monocytes and press filter ;. Check lncrna classes in filter Gene Class to filter for lncrna genes; 5. Click on header of Fold column to sort genes by level of enrichment in classical monocytes; 3. Click on header CL:86 enter ontology for classical monocytes; 5. of Fold tocolumn to sort genesindividual by level ofpage enrichment in classical monocytes; 6. Click on genes on top (i.e. most enriched in classical monocytes), i.e. CATG78.;. Check lncrna classes in filter Gene Class to filter for lncrna genes; 6. Click on genes on top (i.e. most enriched in classical monocytes), i.e. CATG78.; 7. Examining summary summary for gene, gene, notes CATG78. isinassociated with classical monocytes; 5. Examining Click on header of Fold for column to sort genes by level of enrichment classicalwith monocytes; 7. it itnotes CATG78. is associated classical monocytes; 8. Hover over enrichment plot to examine expression of CATG78. in individual samples of classical monocytes; 6. Click on genes on top (i.e. most enriched in classical monocytes), i.e. CATG78.; 8. Hover over enrichment plot to examine expression of CATG78. in individual samples of classical monocytes; 9. Hover over box plot to examine dynamic expression of CATG78. in classical monocytes stimulation; 7. Examining summary for gene, it notes CATG78. is associated with classical monocytes; 9. Hover over box plot to examine dynamic expression of CATG78. in classical monocytes stimulation;. Click Click on on expression view CATG Hover over enrichment to examine expression loci; ofloci; CATG78. in individual samples of classical monocytes;. expression totoplot view CATG78. Continue next page: expression ZENBU view,it expression itshows shows information at locus;monocytes stimulation;. 9. Continue Hover over page: box plot to examine dynamic ofexpression CATG78. in classical. next ininexpression ZENBU view, expression information at locus;. Click on expression to view CATG78. loci; W W W. N A T U R E. C O M / N A T U R E. Continue next page: in expression ZENBU view, it shows expression information at locus; 7 7 7
18 RESEARCH SUPPLEMENTARY INFORMATION 3 2 ZENBU expression view Supplementary Fig. 3 ZENBU Expression View of an LncRNA Enriched in Classical Monocytes. Steps:. Continuing from end of Supplementary Fig. 2, you should see ZENBU expression view for CATG78.; 2. Click on CATG78. in track [Sample Ontology Enrichment] Robust, gene (fold difference) and bring up experiment panel (at bottom) showing fold enrichment of CATG78. in classical monocytes; 3. Click on CATG78. in track [Expression Profile] Robust, gene-based expression (n=829 samples) and bring up experiment panel (at bottom) to show expression of CATG78. in,829 FANTOM5 samples;. Click on track [Epigenome] Roadmap, (n= samples) to shows tissues of roadmap are active within displayed loci; 8
19 SUPPLEMENTARY INFORMATION RESEARCH 66 Use Use Case Case : : Explore Explore Cell Cell Types Types Associated Associated with with Crohn s Crohn s Diseases Diseases Trait List Page Supplementary Fig. Explore Cell Types Associated with Crohn s Diseases. Steps:. Go to Trait List Page by clicking Ontologies in Landing Page; 2. Type crohn and click filter ; 3. Click on DOID:8778 to enter trait individual page for Crohn s Disease GWAS;. Reached trait individual page for Crohn s Disease GWAS; 5. Browse cell-types that are associated Crohn s Disease in Ontology Enrichment table, and click on filter to filter for genes that are associated with Crohn s Disease and enriched in Leukocytes, this should return 23 genes; 6. Examine composition of filtered genes, and go to individual gene page for gene based information; 9
Supplementary Figures
Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive
More informationAccessing and Using ENCODE Data Dr. Peggy J. Farnham
1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000
More informationSupplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.
Supplementary Figure Legends Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. A. Screenshot of the UCSC genome browser from normalized RNAPII and RNA-seq ChIP-seq data
More information7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.
Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the
More informationNature Genetics: doi: /ng Supplementary Figure 1
Supplementary Figure 1 Expression deviation of the genes mapped to gene-wise recurrent mutations in the TCGA breast cancer cohort (top) and the TCGA lung cancer cohort (bottom). For each gene (each pair
More informationNature Structural & Molecular Biology: doi: /nsmb.2419
Supplementary Figure 1 Mapped sequence reads and nucleosome occupancies. (a) Distribution of sequencing reads on the mouse reference genome for chromosome 14 as an example. The number of reads in a 1 Mb
More informationVariant Classification. Author: Mike Thiesen, Golden Helix, Inc.
Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets
More informationMODULE 3: TRANSCRIPTION PART II
MODULE 3: TRANSCRIPTION PART II Lesson Plan: Title S. CATHERINE SILVER KEY, CHIYEDZA SMALL Transcription Part II: What happens to the initial (premrna) transcript made by RNA pol II? Objectives Explain
More informationCoding probability Ratio of sequences with ORF <= Size Size. Expression Levels by Class
Supplementary Figure S Characterization of unannotated transcripts. (A) CPAT coding probability scores and (B) cummulative distribution of ORF length are plotted for each category of protein-coding genes,
More informationMODULE 4: SPLICING. Removal of introns from messenger RNA by splicing
Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by
More informationComparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.
Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different
More informationUser Guide. Association analysis. Input
User Guide TFEA.ChIP is a tool to estimate transcription factor enrichment in a set of differentially expressed genes using data from ChIP-Seq experiments performed in different tissues and conditions.
More informationSupplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.
Supplementary Figure 1 Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Expression of Mll4 floxed alleles (16-19) in naive CD4 + T cells isolated from lymph nodes and
More informationRelationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.
Supplementary Figure 1 Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. (a,b) Values of coefficients associated with genomic features, separately
More informationNature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.
Supplementary Figure 1 Assessment of sample purity and quality. (a) Hematoxylin and eosin staining of formaldehyde-fixed, paraffin-embedded sections from a human testis biopsy collected concurrently with
More informationData mining with Ensembl Biomart. Stéphanie Le Gras
Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:
More informationNature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1
Supplementary Figure 1 Frequency of alternative-cassette-exon engagement with the ribosome is consistent across data from multiple human cell types and from mouse stem cells. Box plots showing AS frequency
More informationDiscovery of Novel Human Gene Regulatory Modules from Gene Co-expression and
Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis Shisong Ma 1,2*, Michael Snyder 3, and Savithramma P Dinesh-Kumar 2* 1 School of Life Sciences, University
More informationNature Methods: doi: /nmeth.3115
Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by
More informationSupplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities
Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities Gábor Boross,2, Katalin Orosz,2 and Illés J. Farkas 2, Department of Biological Physics,
More informationHistone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells
Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Ondrej Podlaha 1, Subhajyoti De 2,3,4, Mithat Gonen 5, Franziska Michor 1 * 1 Department of Biostatistics
More informationBroad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes
Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Kaifu Chen 1,2,3,4,5,10, Zhong Chen 6,10, Dayong Wu 6, Lili Zhang 7, Xueqiu Lin 1,2,8,
More informationExercises: Differential Methylation
Exercises: Differential Methylation Version 2018-04 Exercises: Differential Methylation 2 Licence This manual is 2014-18, Simon Andrews. This manual is distributed under the creative commons Attribution-Non-Commercial-Share
More informationComputational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory
Computational aspects of ChIP-seq John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory ChIP-seq Using highthroughput sequencing to investigate DNA
More informationLecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford
Lecture 8 Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology David K. Gifford 1 Lecture 8 RNA-seq Analysis RNA-seq principles How can we characterize mrna isoform
More informationHands-On Ten The BRCA1 Gene and Protein
Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such
More informationSupplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.
SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock
More informationComputational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq
Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological
More informationRASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays
Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,
More informationTutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures
: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures March 15, 2013 CLC bio Finlandsgade 10-12 8200 Aarhus N Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com support@clcbio.com
More informationNature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1
Supplementary Figure 1 U1 inhibition causes a shift of RNA-seq reads from exons to introns. (a) Evidence for the high purity of 4-shU-labeled RNAs used for RNA-seq. HeLa cells transfected with control
More informationSUPPLEMENTAL INFORMATION
SUPPLEMENTAL INFORMATION GO term analysis of differentially methylated SUMIs. GO term analysis of the 458 SUMIs with the largest differential methylation between human and chimp shows that they are more
More informationCancer Informatics Lecture
Cancer Informatics Lecture Mayo-UIUC Computational Genomics Course June 22, 2018 Krishna Rani Kalari Ph.D. Associate Professor 2017 MFMER 3702274-1 Outline The Cancer Genome Atlas (TCGA) Genomic Data Commons
More informationHigh Throughput Sequence (HTS) data analysis. Lei Zhou
High Throughput Sequence (HTS) data analysis Lei Zhou (leizhou@ufl.edu) High Throughput Sequence (HTS) data analysis 1. Representation of HTS data. 2. Visualization of HTS data. 3. Discovering genomic
More informationThe Epigenome Tools 2: ChIP-Seq and Data Analysis
The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline Epigenome: basics review ChIP-seq overview
More informationMIR retrotransposon sequences provide insulators to the human genome
Supplementary Information: MIR retrotransposon sequences provide insulators to the human genome Jianrong Wang, Cristina Vicente-García, Davide Seruggia, Eduardo Moltó, Ana Fernandez- Miñán, Ana Neto, Elbert
More informationChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles
ChromHMM Tutorial Jason Ernst Assistant Professor University of California, Los Angeles Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap
More informationa) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,
Supplementary Information Supplementary Figures Supplementary Figure 1. a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, gene ID and specifities are provided. Those highlighted
More informationSTAT1 regulates microrna transcription in interferon γ stimulated HeLa cells
CAMDA 2009 October 5, 2009 STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells Guohua Wang 1, Yadong Wang 1, Denan Zhang 1, Mingxiang Teng 1,2, Lang Li 2, and Yunlong Liu 2 Harbin
More informationRNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB
RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB CSF-NGS January 22, 214 Contents 1 Introduction 1 2 Experimental Details 1 3 Results And Discussion 1 3.1 ERCC spike ins............................................
More informationMutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research
Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,
More informationComputational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project
Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene
More informationNature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from
Supplementary Figure 1 SEER data for male and female cancer incidence from 1975 2013. (a,b) Incidence rates of oral cavity and pharynx cancer (a) and leukemia (b) are plotted, grouped by males (blue),
More informationChIP-seq data analysis
ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017 Contents Background ChIP-seq protocol ChIP-seq data analysis Transcriptional regulation Transcriptional
More informationSupplementary Information. Supplementary Figures
Supplementary Information Supplementary Figures.8 57 essential gene density 2 1.5 LTR insert frequency diversity DEL.5 DUP.5 INV.5 TRA 1 2 3 4 5 1 2 3 4 1 2 Supplementary Figure 1. Locations and minor
More informationAssignment 5: Integrative epigenomics analysis
Assignment 5: Integrative epigenomics analysis Due date: Friday, 2/24 10am. Note: no late assignments will be accepted. Introduction CpG islands (CGIs) are important regulatory regions in the genome. What
More informationSupplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature
Supplemental Data Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Jianfu Jiang 1, Xinna Liu 1, Guotian Liu, Chonghuih Liu*, Shaohuah Li*, and Lijun
More informationgenomics for systems biology / ISB2020 RNA sequencing (RNA-seq)
RNA sequencing (RNA-seq) Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome
More informationNature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.
Supplementary Figure 1 Characteristics of SEs in T reg and T conv cells. (a) Patterns of indicated transcription factor-binding at SEs and surrounding regions in T reg and T conv cells. Average normalized
More informationEPIGENOMICS PROFILING SERVICES
EPIGENOMICS PROFILING SERVICES Chromatin analysis DNA methylation analysis RNA-seq analysis Diagenode helps you uncover the mysteries of epigenetics PAGE 3 Integrative epigenomics analysis DNA methylation
More informationLong non-coding RNAs
Long non-coding RNAs Dominic Rose Bioinformatics Group, University of Freiburg Bled, Feb. 2011 Outline De novo prediction of long non-coding RNAs (lncrnas) Genome-wide RNA gene-finding Intrinsic properties
More informationNature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1
Supplementary Figure 1 Effect of HSP90 inhibition on expression of endogenous retroviruses. (a) Inducible shrna-mediated Hsp90 silencing in mouse ESCs. Immunoblots of total cell extract expressing the
More informationTranscript reconstruction
Transcript reconstruction Summary I Data types, file formats and utilities Annotation: Genomic regions Genes Peaks bedtools Alignment: Map reads BAM/SAM Samtools Aggregation: Summary files Wig (UCSC) TDF
More informationEXPression ANalyzer and DisplayER
EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh
More informationPeak-calling for ChIP-seq and ATAC-seq
Peak-calling for ChIP-seq and ATAC-seq Shamith Samarajiwa CRUK Autumn School in Bioinformatics 2017 University of Cambridge Overview Peak-calling: identify enriched (signal) regions in ChIP-seq or ATAC-seq
More informationAnnotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W
Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W 5.1.2007 Overview High-quality finished sequence is much more useful for research once it is annotated. Annotation is a fundamental
More informationUnsupervised Identification of Isotope-Labeled Peptides
Unsupervised Identification of Isotope-Labeled Peptides Joshua E Goldford 13 and Igor GL Libourel 124 1 Biotechnology institute, University of Minnesota, Saint Paul, MN 55108 2 Department of Plant Biology,
More informationOncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies
OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies 2017 Contents Datasets... 2 Protein-protein interaction dataset... 2 Set of known PPIs... 3 Domain-domain interactions...
More informationSupplemental Figure 1. Small RNA size distribution from different soybean tissues.
Supplemental Figure 1. Small RNA size distribution from different soybean tissues. The size of small RNAs was plotted versus frequency (percentage) among total sequences (A, C, E and G) or distinct sequences
More informationRNA-seq Introduction
RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different functional RNAs Which RNAs (and sometimes then translated
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature10866 a b 1 2 3 4 5 6 7 Match No Match 1 2 3 4 5 6 7 Turcan et al. Supplementary Fig.1 Concepts mapping H3K27 targets in EF CBX8 targets in EF H3K27 targets in ES SUZ12 targets in ES
More informationIntroduction. Introduction
Introduction We are leveraging genome sequencing data from The Cancer Genome Atlas (TCGA) to more accurately define mutated and stable genes and dysregulated metabolic pathways in solid tumors. These efforts
More informationEvolution of the unspliced transcriptome
Engelhardt and Stadler BMC Evolutionary Biology (2015) 15:166 DOI 10.1186/s12862-015-0437-7 RESEARCH ARTICLE Open Access Evolution of the unspliced transcriptome Jan Engelhardt 1* and Peter F. Stadler
More informationNature Neuroscience: doi: /nn Supplementary Figure 1
Supplementary Figure 1 Illustration of the working of network-based SVM to confidently predict a new (and now confirmed) ASD gene. Gene CTNND2 s brain network neighborhood that enabled its prediction by
More informationfl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)
KRas;At KRas;At KRas;At KRas;At a b Supplementary Figure 1. Gene set enrichment analyses. (a) GO gene sets (MSigDB v3. c5) enriched in KRas;Atg5 fl/+ as compared to KRas;Atg5 fl/fl tumors using gene set
More informationChromatin marks identify critical cell-types for fine-mapping complex trait variants
Chromatin marks identify critical cell-types for fine-mapping complex trait variants Gosia Trynka 1-4 *, Cynthia Sandor 1-4 *, Buhm Han 1-4, Han Xu 5, Barbara E Stranger 1,4#, X Shirley Liu 5, and Soumya
More informationSupplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region
Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region methylation in relation to PSS and fetal coupling. A, PSS values for participants whose placentas showed low,
More informationEpigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017
Epigenetics Jenny van Dongen Vrije Universiteit (VU) Amsterdam j.van.dongen@vu.nl Boulder, Friday march 10, 2017 Epigenetics Epigenetics= The study of molecular mechanisms that influence the activity of
More informationThe 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis
The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis Tieliu Shi tlshi@bio.ecnu.edu.cn The Center for bioinformatics
More informationAnalysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers
Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies
More informationAnnotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009
Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 1 Abstract A stretch of chimpanzee DNA was annotated using tools including BLAST, BLAT, and Genscan. Analysis of Genscan predicted genes revealed
More informationBioinformatics Laboratory Exercise
Bioinformatics Laboratory Exercise Biology is in the midst of the genomics revolution, the application of robotic technology to generate huge amounts of molecular biology data. Genomics has led to an explosion
More informationNature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.
Supplementary Figure 1 Transcriptional program of the TE and MP CD8 + T cell subsets. (a) Comparison of gene expression of TE and MP CD8 + T cell subsets by microarray. Genes that are 1.5-fold upregulated
More informationPatterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016
Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf Rachel Schwope EPIGEN May 24-27, 2016 What does H3K4 methylation do? Plant of interest: Vitis vinifera Culturally important
More informationSection D. Identification of serotype-specific amino acid positions in DENV NS1. Objective
Section D. Identification of serotype-specific amino acid positions in DENV NS1 Objective Upon completion of this exercise, you will be able to use the Virus Pathogen Resource (ViPR; http://www.viprbrc.org/)
More informationGenetics and Genomics in Medicine Chapter 6 Questions
Genetics and Genomics in Medicine Chapter 6 Questions Multiple Choice Questions Question 6.1 With respect to the interconversion between open and condensed chromatin shown below: Which of the directions
More informationChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand
ChIP-seq analysis J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand Tuesday : quick introduction to ChIP-seq and peak-calling (Presentation + Practical session)
More informationNature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.
Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze
More informationExpanded View Figures
Solip Park & Ben Lehner Epistasis is cancer type specific Molecular Systems Biology Expanded View Figures A B G C D E F H Figure EV1. Epistatic interactions detected in a pan-cancer analysis and saturation
More informationSupplement to SCnorm: robust normalization of single-cell RNA-seq data
Supplement to SCnorm: robust normalization of single-cell RNA-seq data Supplementary Note 1: SCnorm does not require spike-ins, since we find that the performance of spike-ins in scrna-seq is often compromised,
More informationSupplemental Figure 1. Genes showing ectopic H3K9 dimethylation in this study are DNA hypermethylated in Lister et al. study.
mc mc mc mc SUP mc mc Supplemental Figure. Genes showing ectopic HK9 dimethylation in this study are DNA hypermethylated in Lister et al. study. Representative views of genes that gain HK9m marks in their
More informationTo begin using the Nutrients feature, visibility of the Modules must be turned on by a MICROS Account Manager.
Nutrients A feature has been introduced that will manage Nutrient information for Items and Recipes in myinventory. This feature will benefit Organizations that are required to disclose Nutritional information
More informationGene-microRNA network module analysis for ovarian cancer
Gene-microRNA network module analysis for ovarian cancer Shuqin Zhang School of Mathematical Sciences Fudan University Oct. 4, 2016 Outline Introduction Materials and Methods Results Conclusions Introduction
More information38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16
38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and
More informationUser Guide. Protein Clpper. Statistical scoring of protease cleavage sites. 1. Introduction Protein Clpper Analysis Procedure...
User Guide Protein Clpper Statistical scoring of protease cleavage sites Content 1. Introduction... 2 2. Protein Clpper Analysis Procedure... 3 3. Input and Output Files... 9 4. Contact Information...
More informationMATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data
Nucleic Acids Research Advance Access published February 1, 2012 Nucleic Acids Research, 2012, 1 13 doi:10.1093/nar/gkr1291 MATS: a Bayesian framework for flexible detection of differential alternative
More informationA Quick-Start Guide for rseqdiff
A Quick-Start Guide for rseqdiff Yang Shi (email: shyboy@umich.edu) and Hui Jiang (email: jianghui@umich.edu) 09/05/2013 Introduction rseqdiff is an R package that can detect differential gene and isoform
More informationTutorial. ChIP Sequencing. Sample to Insight. September 15, 2016
ChIP Sequencing September 15, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com ChIP Sequencing
More informationHeintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,
Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD, Crawford, GE, Ren, B (2007) Distinct and predictive chromatin
More informationTitrations in Cytobank
The Premier Platform for Single Cell Analysis (1) Titrations in Cytobank To analyze data from a titration in Cytobank, the first step is to upload your own FCS files or clone an experiment you already
More informationSupplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.
Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.32 PCOS locus after conditioning for the lead SNP rs10993397;
More informationProbability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data
Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Tong WW, McComb ME, Perlman DH, Huang H, O Connor PB, Costello
More informationcis-regulatory enrichment analysis in human, mouse and fly
cis-regulatory enrichment analysis in human, mouse and fly Zeynep Kalender Atak, PhD Laboratory of Computational Biology VIB-KU Leuven Center for Brain & Disease Research Laboratory of Computational Biology
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationSingle-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA
www.impactjournals.com/oncotarget/ Oncotarget, Supplementary Materials 2016 Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) DNA Supplementary Materials
More informationGuide to Use of SimulConsult s Phenome Software
Guide to Use of SimulConsult s Phenome Software Page 1 of 52 Table of contents Welcome!... 4 Introduction to a few SimulConsult conventions... 5 Colors and their meaning... 5 Contextual links... 5 Contextual
More informationCircular RNAs (circrnas) act a stable mirna sponges
Circular RNAs (circrnas) act a stable mirna sponges cernas compete for mirnas Ancestal mrna (+3 UTR) Pseudogene RNA (+3 UTR homolgy region) The model holds true for all RNAs that share a mirna binding
More informationAnalyse de données de séquençage haut débit
Analyse de données de séquençage haut débit Vincent Lacroix Laboratoire de Biométrie et Biologie Évolutive INRIA ERABLE 9ème journée ITS 21 & 22 novembre 2017 Lyon https://its.aviesan.fr Sequencing is
More informationSupplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first
Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first intron IGLL5 mutation depicting biallelic mutations. Red arrows highlight the presence of out of phase
More informationProtein Reports CPTAC Common Data Analysis Pipeline (CDAP)
Protein Reports CPTAC Common Data Analysis Pipeline (CDAP) v. 05/03/2016 Summary The purpose of this document is to describe the protein reports generated as part of the CPTAC Common Data Analysis Pipeline
More information