Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines

Size: px
Start display at page:

Download "Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines"

Transcription

1 Reconstruction of enhancer-target networks in 935 samples of human primary cells, tissues and cell lines Qin Cao, Christine Anyansi,2, Xihao Hu, Liangliang Xu 3, Lei Xiong 4, Wenshu Tang 3, Myth T.S. Mok 3, Chao Cheng 5, Xiaodan Fan 6, Mark Gerstein 7,8,9, Alfred S.L. Cheng 3 and Kevin Y. Yip,,,2 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong 2 Department of Computer Science, Vrije Universiteit, Amsterdam, Netherlands 3 School of Biomedical Sciences, 4 Department of Anatomical and Cellular Pathology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong 5 Department of Biomedical Data Sciences, Dartmouth College, Hanover, New Hampshire, USA 6 Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong 7 Program in Computational Biology and Bioinformatics, 8 Department of Molecular Biophysics and Biochemistry, 9 Department of Computer Science, Yale Unviersity, New Haven, Connecticut, USA Hong Kong Bioinformatics Centre, CUHK-BGI Innovation Institute of Trans-omics, 2 Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong To whom correspondence should be addressed. kevinyip@cse.cuhk.edu.hk

2 Supplementary Note Supplementary results Enhancer features can quantitatively infer target gene expression To formally test if the quantitative relationship is strong enough for enhancer target identification, we collected predicted enhancers, epigenomic and transcriptomic data from two human cell lines (Supplementary Table ). Aggregation plots around active enhancers (Figure a) display much stronger patterns of typical enhancer features [] than inactive enhancers (Figure b), suggesting the reliability of these predicted enhancers. We defined active and inactive enhancer-target pairs based on ChIA-PET (Figure c), and found that H3K4me, H3K4me2, H3K27ac, enhancer RNA (erna) and DNase I hypersensitivity levels at the enhancers all correlate positively with target gene expression, while the repressive mark H3K27me3, and to some extent DNA methylation, correlate negatively (Figure d). We constructed Random Forest models [2] that take enhancer features as input to infer the expression class (highest, medium-high, medium-low and lowest) of the target gene, as in previous studies that used promoter or gene body features as input [3, 4, 5]. In cross-validation, the models distinguished genes in different expression classes with an average one-against-all area under the receiver operator characteristic (AUC) of 4, which is significantly higher than random predictions (p<e-6, Mann-Whitney U test, Supplementary Figure a). Even when only one type of enhancer features was used as input, the resulting models were still highly accurate (Supplementary Figure a). The enhancer models were in general not as accurate as models involving promoter or gene body features (Supplementary Figure b), but enhancer features could improve the accuracy of models involving single promoter/gene body features in most cases (Supplementary Figure c). We repeated these analyses i) in MCF-7, ii) with other sets of predicted active enhancers in these two cell lines, and iii) with regression of log expression levels instead of predicting expression classes. All the results (Supplementary Figures 2-5) confirm that enhancer features can quantitatively infer target gene expression. Statistics of the enhancer-target networks in the FANTOM5 samples Overall, 22,232 TSSs were inferred to have a regulating enhancer and 32,86 enhancers were inferred to have a target TSS in at least one of the FANTOM5 samples, forming 77,532 enhancer-target pairs in total. On average, each TSS is regulated by 8. enhancers and each enhancer regulates 5.4 TSSs. In each specific sample, the average numbers of regulated TSSs, regulating enhancers and active enhancertarget pairs are,74, 72 and 2,392, respectively (Supplementary Figure a). The average number of regulating enhancers per TSS in each sample ranges from to 2.3 (Supplementary Figure b), and the average number of target TSSs per enhancer in each sample ranges from 2.3 to 5.5 (Supplementary Figure c). Previous studies have suggested that the number of regulated TSSs per enhancer ranges between and 3 in each sample [6, 7], which largely overlaps our range. Our larger range of values is likely due to the much larger number of samples considered. These studies also suggested regulating enhancers per TSS, which also overlap our values for FANTOM5 (-2.3). Most enhancers regulate in no more than 5 samples, with a median of 5 samples (Supplementary Figure d). However, there is a very long tail, with some enhancers regulating in a lot more samples than the average. The median distance between an enhancer and a target TSS it regulates in a sample ranges from 33kbp to 77kbp (Supplementary Figure e), which is similar to the 58kbp-23kbp predicted in [6]. In fact, for of the 2 samples in [6], the range was 58kbp to 89kbp, which is highly consistent with our range. On the other hand, our range is smaller than a previously reported value of 24kbp based on Hi-C data [7], possibly due to some enhancer-tss pairs considered connected by the Hi-C data but are not functionally active. In each sample, on average 72% of the active enhancers regulate the closest TSS in addition to any other TSSs (Supplementary Figure f). Interestingly, among all the enhancer-target pairs in each sample, S

3 only 3%-36% of them involve the closest TSS of an enhancer (Supplementary Figure f), indicating that enhancers usually regulate several TSSs farther away in addition to the TSSs closest to them. Supplementary methods Collection and processing of data sets for modeling gene expression with each enhancer-tss pair considered independently We collected computationally predicted active enhancers in two human cell lines from multiple sources (Supplementary Table ). To facilitate model construction, we resized each enhancer to 2,5bp by trimming or extending it while maintaining the original center of the enhancer. The purpose of using this relatively long span was to capture spatial signal patterns of features in the flanking regions of each enhancer, which have been shown to be important for indicating enhancer activities [8, 9]. We have also tried aligning enhancers based on their peak locations of DNase I hypersensitivity, and got similar results (data not shown). For a given data source, we defined active and inactive enhancers of a cell line as follows. The active enhancers were taken directly from the ones predicted active in this cell line. The inactive ones were the predicted active enhancers in other cell lines that did not overlap any active enhancers in this cell line. The enhancers were paired with target genes using ChIA-PET chromosome conformation data. If an active enhancer was connected to the TSS of a RefSeq (GRCh37) protein-coding gene based on the ChIA-PET data of the target cell line, the enhancer-tss pair was defined as an active pair in this cell line. If ) an inactive enhancer of this cell line was connected to the TSS of a RefSeq protein-coding gene based on the ChIA-PET data of another cell line, 2) it was not involved in any connections based on the chromosome conformation data in the current cell line, and 3) the TSS was not involved in any active pairs in the current cell line, the enhancer-tss pair was defined as an inactive pair in this cell line. Therefore in this procedure, no expression or activity features of the genes and enhancers were involved in defining the active and inactive pairs. For this part of study that considers each enhancer-target association independently, if a gene was involved in multiple enhancer-target pairs, we randomly retained only one of them and randomly selected an annotated transcript for the TSS. Finally, we randomly removed pairs from the active or inactive pairs until the two sets had the same size. Each (active or inactive) resized enhancer was then sub-divided into five 5bp bins in ascending order of their genomic locations, and the average signals of H3K4me, H3K4me2, H3K27ac, H3K27me3, DNase I hypersensitivity, bi-directional transcription and DNA methylation were computed for each bin. Bi-directional transcript levels defined by CAGE tags were obtained from FANTOM5. The raw levels x were transformed by the function f(x) = log 2 (x+). The other features, defined by ChIP-seq, DNase-seq and reduced representation bisulfite sequencing (RRBS), were obtained from ENCODE. H3K4me and H3K4me2 data were available only for K562 but not MCF-7. Similarly, each promoter was composed of five 5bp bins that spanned from 2,bp upstream to 5bp downstream of a TSS, with Bin being upstream,5bp to 2,bp, Bin 2 being upstream,bp to,5bp, and so on. For each bin, the average values of the following features were computed: H3K4me3, H3K9ac, H3K27me3, DNase I hypersensitivity and DNA methylation. H3K9ac data were available only for K562 but not MCF-7. For gene bodies, instead of defining bins of a fixed size, they were sub-divided into regions according to the gene structure as previously proposed [], namely first exon, internal exons, last exon, first intron, internal introns, and last intron. The features considered were H3K27me3, H3K36me3, DNase I hypersensitivity and DNA methylation. In order to compute feature values in these regions, for models that were compared with the ones involving gene body features, only genes with at least three introns were considered before balancing the number of active and inactive pairs. The expression level of each gene was defined as the number of reads mapped to the kbp region centered on the TSS, based on ENCODE poly-a enriched whole-cell RNA-seq data. Replicates were combined by taking their average. In the regression analyses, raw expression levels were log-transformed by the same function applied to erna levels. S2

4 Machine learning procedure for modeling gene expression with each enhancer- TSS pair considered independently For the classification task, we divided all the included TSS into four equal-sized classes based on their expression levels, namely highest, medium-high, medium-low and lowest. We randomly selected onefifth of the pairs as a left-out testing set, and used the remaining four-fifth for training a Random Forest model for each of the four expression classes using the one-against-all setting. The area under the receiver operator characteristic (AUC) was computed for each class, and the average AUC of the four classes was computed based on the testing set results in a five-fold cross-validation procedure by using each one-fifth of pairs as the left-out testing set in turn. The whole procedure was further repeated five times to produce the average and standard deviation of the average AUC values. To evaluate the best-case accuracy that a statistical model can achieve based on the data, we used the expression levels of the TSSs in one replicate as the predictor of their expression class in the other replicate. We took each of the two replicates as the predictor in turn and computed their average AUC. For the regression task, a similar procedure was used. In this case, the target was the log-transformed expression values instead of the discrete expression classes. We constructed linear regression models implemented in Weka []. The accuracy of a model was quantified by the percentage error. Suppose a gene in the testing set has an actual expression level of y i and a predicted expression level of ỹ i, the percentage error is defined as y i ỹ i /y i. The median of these error values among all genes was recorded. An overall percentage error was obtained by combining the results of the five testing sets in a five-fold cross-validation procedure. The procedure was further repeated ten times to obtain the average and standard deviation of these percentage error values. Machine learning procedures for studying the joint effect of multiple enhancers on a target TSS To model the joint effect of multiple enhancers on a TSS, each time we took a target TSS and all enhancers within Mbp from it from the union of all processed enhancers from the different samples. The purpose of having this maximum distance threshold was to keep the amount of computations manageable and to reduce false positive predictions since only a small fraction of DNA contacts involve DNA regions beyond Mbp apart [7]. This threshold was also used in some previous studies [6, 2]. We considered only TSSs with non-zero expression in at least one sample. We used the histone modification data as enhancer features for the four ENCODE+Roadmap data sets, and the CAGE data as the only enhancer feature for the FANTOM5 data set. To avoid the training and testing sets from having very similar samples, which could lead to artificially high prediction accuracies by simple memorization, we used a left-one-categoryout (in the case of the ENCODE+Roadmap) / left-one-facet-out (in the case of FANTOM5) procedure to evaluate the accuracy of expression models. Specifically, we constructed an expression model for this TSS using enhancer features and log expression values of all but one sample category/facet, and then applied the model to predict the log expression value of the TSS in the samples of the left-out category/facet. We considered three different types of model, namely ) linear regression involving one enhancer within Mbp of the TSS at a time (denoted as independent ), 2) multiple linear regression involving all enhancers within Mbp of the TSS with LSO regularization [3], and 3) multiple linear regression involving all enhancers within Mbp of the TSS with Elastic Net regularization [4]. In the independent case, we compared the performance of the different enhancers and reported the one with the highest accuracy. We used the R package glmnet for LSO and Elastic Net using their default parameter values. Since the LSO and Elastic Nets results were almost identical, we only include results from LSO as the method that considers the joint effect of multiple enhancers to avoid redundancy. We have also tried some non-linear models such as Random Forest but found that the results were not better than the regularized linear regression models. The accuracy of a model was quantified by the percentage error in the testing set. The median of the percentage error values among all TSSs and all samples was computed. Ten random subsets were drawn from the testing results, to compute the average and standard deviation of the median percentage error values. S3

5 Checking the consistency between JEME s predictions and validation data We compared the predictions of JEME with the validation data using both cross-validation and acrosssample tests. Since the validation data are expected to miss some active enhancer-target interactions, assuming all pairs not supported by the validation data to be negative would create an unrealistic positiveto-negative ratio. We therefore sampled background pairs as negatives using four different methods, which collectively provide a better evaluation of the accuracy of JEME. The first set of background pairs was based on random targets, which paired enhancers in the positive pairs with other random TSSs within Mbp. The second set was based on random DNA contacts [6]. Specifically, given two genomic locations separated by a distance of sbp, the probability density for them to have a detected contact in a chromosome conformation capture experiment was f(s) = ks 3/2 e 4/s2, where k reflects the efficiency of cross-linking [5]. We drew a positive enhancer-target pair randomly, used the above formula to sample a random contact point of the enhancer, and identified the TSS closest to this contact point. If the resulting enhancer-tss pair was not supported by the validation data, we included it as a background pair. We then repeated this procedure until getting the desired positive-tonegative ratio as described below. The third set of background pairs was based on random enhancers, which paired TSSs in the positive pairs with other random enhancers within Mbp. The fourth set was based on random pairs of enhancers and TSSs within Mbp, where both the enhancer and the TSS in each pair were randomly chosen. In all four cases, we determined the number of negative pairs using the following method. For a given sample, let E, P and R be the number of active enhancers, the total number of enhancer-tss pairs within Mbp, and the average number of TSSs per active enhancer, respectively. E and P have fixed values for the sample, while R should take a value around -4 according to previous studies [6, 7]. The potential numbers of active and inactive enhancer-gene pairs in the sample are E R and P E R, respectively, E R P E R and thus the positive-to-negative ratio should be, and the number of negative enhancer-tss pairs should be the number of positive enhancer-tss pairs (based on ChIA-PET, Hi-C and eqtl connections) divided by this ratio. The actual ratios used are shown in Supplementary Tables 2-4. In the case of ENCODE+Roadmap, the ratio was achieved by sampling background pairs using one of the four methods. In the case of FANTOM5, due to a smaller set of enhancers defined, sometimes the ratio could only be achieved by considering all non-positive pairs to be negative, and no sampling was necessary. Taking the positive pairs and one of these four sets of background pairs as the negatives, we performed 5-fold cross-validation tests to compute the area under the receiver operator characteristics (AUROC) and area under the precision-recall (AUPR) curve of the testing sets in Figures 3b-e and Supplementary Figures 7-9. We further repeated the whole procedure times based on different random negative pairs and using different random ways to divide up the pairs into the five sets to produce the error bars. For FANTOM5 K562 and MCF7 samples, since all the background pairs had to be selected as negatives to achieve the positive-to-negative ratio, we produced the error bars by repeating the procedure using different random seeds when constructing Random Forest models. We also performed across-sample tests, in which the positive and negative pairs from K562 were used to train the Random Forest model in the second step of JEME, and the model was applied to another sample based on the first step global models and the sample-specific information from this other sample. The procedure was repeated times based on different random training sets in K562 and corresponding random testing sets in the other sample. The absolute AUROC values can reflect the accuracy of JEME, since a random predictor and a perfect predictor always have an AUROC value of and, respectively, regardless of the positive-to-negative ratio. On the other hand, AUPR values depend highly on this ratio. We therefore setup baseline models that help indicate how accurate the predictions of JEME were. In the first baseline model ( Distance only ), the genomic distance between an enhancer and a TSS is used as the only feature to train a Random Forest model for predicting their chance of interaction. In the second baseline model ( Nearest gene ), each enhancer was predicted to regulate only its nearest gene. We also compared JEME s AUROC and AUPR values to the expected values of random predictions ( Background ). The AUROC and AUPR values were computed by the R package PRROC. S4

6 Comparisons with other enhancer-target prediction methods We compared JEME with the following five other enhancer-target prediction methods. To ensure the fairness of the comparisons, for each method compared we used the same data for it and for JEME. Since GM2878 and K562 were commonly used by all five methods, we performed the comparisons using them. By default, we used the enhancers defined by ENCODE+Roadmap in these two cell lines. The positive pairs were those supported by the validation data, and the negative pairs were drawn by the random target method. In the comparisons with some methods, additional filtering criteria were applied, as detailed below. In the case of Ripple, the enhancer-tss pairs from their supplementary Web site were used directly. Ernst et al. (2) [6] developed a logistic regression model to identify enhancers with histone modification signals significantly correlated with expression level of a gene as potential enhancer-target pairs. We re-implemented this method based on the description in their paper. Based on the criterion specified in [6], we considered only enhancer-tss pairs at a genomic distance between 5kbp and 25kbp in the predictions. Since this method is unsupervised, it does not involve model learning. We used its full set of predictions in GM2878 as both its cross-validation and across-sample results. JEME was applied to the same data set. We downloaded the software of IM-PET [6] from and installed it on a Linux platform. We extracted the active enhancers of K652 and GM2878 from their supplementary file predictions.xlsx and overlapped them with our enhancers. Correspondingly, we kept only positive and negative enhancer-tss pairs involving these enhancers. Both IM-PET and JEME were then tested based on these pairs. Since the IM-PET software did not provide any function for performing cross-validation, we only compared JEME with it using the across-sample setting. We downloaded the PreSTIGE [7] predictions in GM2878 and K562 from edu/prestige/. We overlapped them with our positive and negative pairs. PreSTIGE provided two scores (Q-normed gene expression score and Q-normed H3K4me signal score) to indicate chance of enhancer-tss interaction. We used both scores to rank PreSTIGE s predictions, and report the one with better results. Since PreSTIGE is unsupervised, again we used its full set of predictions in GM2878 as both its cross-validation and across-sample results. JEME was applied to the same data set. We downloaded the training pairs of Ripple [2] in GM2878 and K562 from wisc.edu/ sroy/ripple/. Both JEME and Ripple were applied to this data set, using both crossvalidation and across-sample settings. We downloaded the software of TargetFinder [8] from and installed it on a Linux platform. We used our positive and negative pairs to perform both crossvalidation and across-sample tests with JEME and TargetFinder. To further ensure the fairness of the comparisons, we tuned all the parameters in the same way for the two methods. We considered four different sets of input data of TargetFinder. The first set contained only 4 types of input data, namely DNase I hypersensitivity, H3K4me, H3K27ac and H3K27me3, which were available for all the 27 ENCODE+Roadmap samples with imputed data. This set allowed us to compare JEME and TargetFinder using exactly the same inputs. In the second and third sets, we used the top 8 and 6 types of input data defined in [8]. In the fourth set, types of data were available in GM2878 for the cross-validation test. In the across-sample test, 7 types of data commonly available in GM2878 and K562 were used. We also found another recent study [9] that used a tensor model to predict enhancer-promoter, enhancer-enhancer and promoter-promoter interactions. However, it was not applied to GM2878 or K562 and we encountered technical difficulties when trying to compare JEME with it, and therefore this method was not included in our comparisons. Defining binary enhancer-target networks of the 935 samples In all the above evaluations, each enhancer-tss pair was given an interaction score, such that the different pairs could be ranked to compute AUPR and AUROC. On the other hand, for the other analyses it would S5

7 be more convenient to choose a cutoff score to form binary enhancer-target networks that contain only the pairs with a score higher than the cutoff. We selected the cutoff value that maximized the F-score of the training data, defined as the harmonic mean of precision and recall. The testing data were not involved in this parameter tuning, and thus the reported accuracy values of JEME were not biased by this parameter tuning procedure. The cutoff values selected were and 5 for FANTOM5 and ENCODE+Roadmap, respectively (Supplementary Figure 6). Analysis of chromosome domains and CTCF binding sites We collected TADs in IMR9 and CCDs in GM2878 from [2] and [2], respectively. Each TAD/CCD (referred to as a chromosome domain in general) was represented by a contiguous genomic region. Different chromosome domains do not overlap, and there are genomic regions not covered by any chromosome domains. To evaluate whether our inferred enhancer-target pairs were more completely contained within a chromosome domain than expected by chance, we first computed the number of pairs residing entirely within a chromosome domain. We then shuffled the chromosome domains by repeatedly swapping the location of two random domains. The genomic locations of all domains between them were also shifted accordingly based on the size difference of the two swapping domains. The number of enhancer-target pairs residing entirely within a shuffled chromosome domain was then recorded. These numbers from, lists of shuffled chromosome domains were used to fit a Gaussian distribution, and a p-value of the actual number from the original chromosome domains was computed based on its z-score. CTCF binding peaks were collected for 4 cell lines from ENCODE, namely GM2878, H-hESC, HeLa-S3, HepG2, MHEC, HSMM, HSMMt, HUVEC, K562, NHA, NHDFAd, NHEK, NHLF and Osteobl. An enhancer-tss pair was defined to have an intervening CTCF binding peak if the binding peak was anywhere between the enhancer and the TSS. We sampled random enhancer-tss pairs matching both the total number and the distance distribution of the inferred enhancer-target pairs, and computed the number of pairs with intervening CTCF binding. These numbers from, sets of random pairs were used to fit a Gaussian distribution, and the p-value of the actual number of pairs with intervening CTCF binding was computed based on its z-score in the distribution. A CTCF binding peak was considered to define a CCD boundary if it was within a certain distance from the boundary. We considered multiple distance values, including kbp, 2kbp, 3kbp, 4kbp and 5kbp. For each distance value, we then defined a 2 2 contingency table of whether a CTCF binding peak defined a CCD boundary, and whether it intervened an enhancer-target pair. A p-value was then computed using a one-sided Fisher s exact test, where a significant p-value means that the intervening CTCF binding peaks were significantly depleted rather than enriched in defining the CCD boundaries. Finally, among the 5 p-values obtained from the 5 distance values, the least significant one was reported. Analysis of the beta-globin locus control region For this example of the HBB locus, we used enhancers from FANTOM5, because FANTOM5 enhancers overlapped with the LCR HS sites more accurately than the ENCODE+Roadmap enhancers. There are two enhancers predicted to be active in LCR region of beta-globin in the four K562 samples of FANTOM5, namely chr: and chr: Since the 3C data from Dostie et al. (26) [22] is for HS5, we explored the targets of the enhancer chr: , which is the closest enhancer to HS5. Clustering of samples based on enhancer networks For each sample, we represented its inferred enhancer-target network by a vector. The vector has an entry for every enhancer-tss pair within Mbp from each other, involving the union of enhancers from all samples and all TSSs, the value of which corresponds to the prediction score of Random Forest in the second step of JEME indicating how likely the enhancer targets the TSS in the sample. For an enhancer inactive in a sample, all the entries involving it received a score of -.. The distance between any two S6

8 samples was defined as one minus the Pearson correlation of their vectors. The resulting distance matrix was then used to perform average-link hierarchical clustering using the R package ape. Visualizing group-specific parts of the enhancer target network To visualize parts of the global enhancer-target network that are specifically active in a subset of related samples, we put all samples into a small number of groups (Supplementary Tables 6-8). A global network was first formed using all enhancers and TSSs involved in an enhancer-target pair in any sample. TSSs regulated only in a group, enhancers regulating only in a group, and enhancer-target pairs active only in a group were colored by the group color. A virtual node was then created for each sample group, and it was connected to all enhancer and TSS nodes active only in this sample group by virtual edges. The virtual nodes and virtual edges were not shown in the figures to avoid creating visual artifacts. The placement of network nodes and edges were then determined by the force directed layout option of Cytoscape [23]. The virtual nodes and virtual edges encouraged nodes with the same color to cluster together, but the actual network layout depended on the overall connections. Definitions of expression specificity We defined expression specificity scores based on the entropy-related ideas in [24]. For a TSS i with expression level y ij in sample j, the overall expression specificity score was defined as S(i) = log 2 m + m [ ] y ij y ij m l= y log 2 m il l= y, il j= where m is the total number of samples. A TSS with uniform expression across the m samples would receive the minimum specificity score of, while a TSS specifically expressed in only one sample would have a maximum specificity score of log 2 m. We also defined a similar score to quantify whether a TSS i is specifically expressed in a particular sample j: S(i, j) = S(i) + log 2 y ij m l= y. il In this function, S(i, j) is bounded above by the overall specificity score S(i), with a reduction of magnitude log Pm y ij 2 l= y depending on how specific TSS i is expressed in sample j. il In the calculation of S(i, j), in order to avoid taking the logarithm of, a small positive constant is added to all expression values. Full-length gel image The full-length gel image of Figure 6d is provided in Supplementary Figure 7. S7

9 Supplementary tables Supplementary Table : Summary of datasets used for studying each enhancer-target pair independently. Cell line Enhancers (features used in prediction) Enhancer-target associations K562 ChromHMM [6] (histone marks, protein binding) POLR2A ChIA-PET [25] K562 FANTOM5 [8] (erna) POLR2A ChIA-PET [25] MCF-7 CSI-ANN [26] (histone marks) POLR2A ChIA-PET [25] MCF-7 FANTOM5 [8] (erna) POLR2A ChIA-PET [25] Supplementary Table 2: Summary of features used by JEME. ENCODE+Roadmap 27 ENCODE+Roadmap 48 FANTOM5 imp-imp non-non Step (global) Enhancer DNase I, H3K4me, H3K27ac and H3K27me3 in all samples H3K4me, H3K27ac and H3K27me3 in all samples CAGE in all samples TSS RNA-seq in all samples RNA-seq in all samples CAGE in all samples Step 2 (single-sample) Enhancer-TSS pair Single-enhancer error Single-enhancer error Single-enhancer error terms of DNase I, terms of H3K4me, term of CAGE model H3K4me, H3K27ac and H3K27ac and H3K27me3 in this sample, genomic H3K27me3 models in this models in this sample, distance sample, genomic distance genomic distance Enhancer DNase I, H3K4me, H3K27ac and H3K27me3 in this sample TSS DNase I, H3K4me, H3K27ac and H3K27me3 in this sample Enhancer-TSS window DNase I, H3K4me, H3K27ac and H3K27me3 in this sample H3K4me, H3K27ac and H3K27me3 in this sample H3K4me, H3K27ac and H3K27me3 in this sample H3K4me, H3K27ac and H3K27me3 in this sample CAGE in this sample CAGE in this sample CAGE in this sample Supplementary Table 3: Summary of datasets used for evaluating the reliability of the inferred enhancertarget pairs. Data type Sample Data used in inferring enhancer targets ChIA-PET (H3K4me2) [27] CD4+ T cells ENCODE+Roadmap, FANTOM5 ChIA-PET (POLR2A) [25] K562 ENCODE+Roadmap, FANTOM5 ChIA-PET (POLR2A) [25] MCF-7 FANTOM5 eqtl [28] GM2878 ENCODE+Roadmap eqtl [28] HepG2 ENCODE+Roadmap Hi-C [7] IMR9 ENCODE+Roadmap S8

10 Supplementary Table 4: CTCF binding intervening (I) or not intervening (N) predicted enhancer-target pairs. The average values of, sets of random pairs are reported. Sample Predicted enhancer-target pairs Random pairs p-value I N N/(I+N) I N N/(I+N) (one-sided Z test) GM ,67 4,79. 4,976.3, <2.2E-6 H-hESC 6,759 2,983 8, <2.2E-6 HeLa-S3 8,36 3, , <2.2E-6 HepG2 27,623 9, ,248.9,7..46 <2.2E-6 HMEC 4,972 6,48 2, <2.2E-6 HSMM 8,4 2,7 9, <2.2E-6 HSMMt,83 2, , <2.2E-6 HUVEC 2,55 3, , <2.2E-6 K562 45,597 4, ,97.4 2, <2.2E-6 NHA 8,496 3,29, <2.2E-6 NHDFAd 2,365 2,252 3, <2.2E-6 NHEK 2,749 4, ,228.7, <2.2E-6 NHLF 2,6,273 2, <2.2E-6 Osteobl 6,773 3,5 8, <2.2E-6 Supplementary Table 5: CTCF binding intervening predicted enhancer-target predictions in GM2878 that define (D) or not define (N) CCD boundaries. The distance column shows the maximum distance between a CTCF binding site and the boundary of a CCD such that the CTCF binding is considered defining the CCD boundary. Distance Intervening predicted Not intervening predicted p-value enhancer-target pairs enhancer-target pairs (Fisher s exact test) D N N/(D+N) D N N/(D+N) kbp,45 26,53 5 2,92 3, E-5 2kbp,9 25, ,887 3,73 4.8E-6 3kbp 2,47 25,76 2 3,25 3,42 3.2E-6 4kbp 2,33 25, ,389 3,228.E-4 5kbp 2,42 25,487 3,55 3,66 9.5E-5 Supplementary Table 6: Sub-grouping of the ENCODE+Roadmap samples. Group Samples Cardiovascular E33, E34, E37, E38, E39, E4, E4, E42, E43, E44, E45, E47, E48, E62, E65, E83, E95, E4, E5 Digestive E75, E77, E79, E84, E85, E92, E94, E, E2, E6, E9, E Immune E29, E3, E3, E32, E35, E36, E46, E5, E5, E93, E2 Muscle cells E52, E76, E78, E89, E9, E, E3, E7, E8, E Neurosystem E53, E54, E67, E68, E69, E7, E7, E72, E73, E74, E8, E82 Protective E27, E28, E55, E56, E57, E58, E59, E6, E63 Respiratory E7 Stem cells E, E2, E3, E4, E5, E6, E7, E8, E9, E, E, E2, E3, E4, E5, E6, E8, E9, E2, E2, E22, E23, E24, E25, E26, E49 Others E66, E8, E86, E87, E88, E9, E96, E97, E98, E99, E3, E4, E5, E6, E7, E8, E9, E2, E2, E22, E23, E24, E25, E26, E27, E28, E29 S9

11 Supplementary Table 7: Sub-grouping of the FANTOM5 primary cells. Group Cardiovascular Digestive Fibroblast Immune Muscle and bone Neurosystem Organlining Outerlayer Respiratory Stem cells Samples CNhs275, CNhs292, CNhs234, CNhs235, CNhs257, CNhs3552, CNhs3553 CNhs847, CNhs5, CNhs54, CNhs335, CNhs37, CNhs95, CNhs27, CNhs267, CNhs268, CNhs269, CNhs293, CNhs234, CNhs2349, CNhs2494, CNhs2626, CNhs28, CNhs28, CNhs282 CNhs848, CNhs866, CNhs867, CNhs874, CNhs878, CNhs52, CNhs65, CNhs74, CNhs82, CNhs39, CNhs322, CNhs337, CNhs339, CNhs35, CNhs352, CNhs353, CNhs354, CNhs378, CNhs379, CNhs92, CNhs97, CNhs99, CNhs9, CNhs92, CNhs93, CNhs94, CNhs952, CNhs953, CNhs96, CNhs962, CNhs97, CNhs98, CNhs982, CNhs996, CNhs26, CNhs2, CNhs23, CNhs227, CNhs228, CNhs238, CNhs239, CNhs252, CNhs255, CNhs257, CNhs259, CNhs26, CNhs265, CNhs295, CNhs28, CNhs2344, CNhs2493, CNhs2498, CNhs2499 CNhs843, CNhs852, CNhs853, CNhs854, CNhs855, CNhs857, CNhs858, CNhs859, CNhs86, CNhs862, CNhs865, CNhs62, CNhs73, CNhs334, CNhs897, CNhs899, CNhs9, CNhs94, CNhs95, CNhs96, CNhs954, CNhs955, CNhs956, CNhs957, CNhs959, CNhs997, CNhs998, CNhs999, CNhs2, CNhs2, CNhs23, CNhs29, CNhs222, CNhs2343, CNhs2352, CNhs2354, CNhs2566, CNhs2575, CNhs2592, CNhs2593, CNhs2594, CNhs395, CNhs322, CNhs323, CNhs324, CNhs325, CNhs326, CNhs327, CNhs328, CNhs325, CNhs326, CNhs3223, CNhs3224, CNhs3468, CNhs348, CNhs3484, CNhs349, CNhs352, CNhs353, CNhs3535, CNhs3536, CNhs3537, CNhs3538, CNhs3539, CNhs354, CNhs354, CNhs3547, CNhs3548, CNhs3549, CNhs38, CNhs382, CNhs383, CNhs384 CNhs838, CNhs839, CNhs863, CNhs868, CNhs869, CNhs87, CNhs877, CNhs883, CNhs83, CNhs84, CNhs85, CNhs86, CNhs87, CNhs88, CNhs9, CNhs9, CNhs35, CNhs39, CNhs3, CNhs32, CNhs324, CNhs329, CNhs372, CNhs373, CNhs385, CNhs9, CNhs98, CNhs92, CNhs923, CNhs927, CNhs963, CNhs964, CNhs965, CNhs973, CNhs976, CNhs98, CNhs987, CNhs988, CNhs989, CNhs99, CNhs99, CNhs992, CNhs24, CNhs27, CNhs28, CNhs25, CNhs22, CNhs22, CNhs235, CNhs236, CNhs243, CNhs244, CNhs245, CNhs246, CNhs248, CNhs249, CNhs25, CNhs253, CNhs256, CNhs26, CNhs28, CNhs2569, CNhs2597, CNhs2639, CNhs264, CNhs264, CNhs273, CNhs2894 CNhs864, CNhs32, CNhs96, CNhs25, CNhs28, CNhs27, CNhs2338, CNhs2726, CNhs385 CNhs837, CNhs842, CNhs85, CNhs85, CNhs87, CNhs872, CNhs875, CNhs882, CNhs884, CNhs6, CNhs77, CNhs79, CNhs92, CNhs33, CNhs37, CNhs323, CNhs325, CNhs33, CNhs33, CNhs332, CNhs333, CNhs336, CNhs338, CNhs34, CNhs375, CNhs376, CNhs377, CNhs382, CNhs383, CNhs386, CNhs896, CNhs93, CNhs925, CNhs926, CNhs966, CNhs967, CNhs972, CNhs975, CNhs977, CNhs978, CNhs993, CNhs29, CNhs2, CNhs22, CNhs24, CNhs26, CNhs222, CNhs223, CNhs224, CNhs226, CNhs232, CNhs233, CNhs237, CNhs25, CNhs254, CNhs258, CNhs262, CNhs274, CNhs279, CNhs284, CNhs286, CNhs287, CNhs288, CNhs22, CNhs22, CNhs223, CNhs224, CNhs2342, CNhs2347, CNhs2348, CNhs2495, CNhs2496, CNhs2497, CNhs2568, CNhs257, CNhs2572, CNhs2574, CNhs2589, CNhs2596, CNhs2624, CNhs2728, CNhs2732, CNhs2733, CNhs38, CNhs355, CNhs355, CNhs386, CNhs387, CNhs388, CNhs389 CNhs879, CNhs64, CNhs38, CNhs979, CNhs23, CNhs23, CNhs2339, CNhs25 CNhs328 CNhs844, CNhs845, CNhs57, CNhs63, CNhs34, CNhs344, CNhs345, CNhs347, CNhs349, CNhs35, CNhs384, CNhs2, CNhs24, CNhs25, CNhs225, CNhs226, CNhs227, CNhs2363, CNhs2364, CNhs2365, CNhs2366, CNhs2367, CNhs2368, CNhs2369, CNhs237, CNhs237, CNhs2379, CNhs238, S CNhs2492, CNhs252, CNhs253, CNhs254, CNhs256, CNhs273, CNhs2922, CNhs398

12 Group Cardiovascular Digestive Endocrine Excretion Immune Muscle and bone Neurosystem Organlining Reproductive Respiratory Supplementary Table 8: Sub-grouping of the FANTOM5 tissues. Samples CNhs62, CNhs653, CNhs75, CNhs76, CNhs67, CNhs672, CNhs673, CNhs757, CNhs758, CNhs76, CNhs76, CNhs789, CNhs79, CNhs948, CNhs949, CNhs2844, CNhs2855, CNhs2856, CNhs2857 CNhs69, CNhs62, CNhs624, CNhs63, CNhs677, CNhs768, CNhs77, CNhs773, CNhs777, CNhs78, CNhs794, CNhs798, CNhs2842, CNhs2848, CNhs2849, CNhs2852, CNhs2853 CNhs633, CNhs634, CNhs65, CNhs756, CNhs769 CNhs66, CNhs622, CNhs652 CNhs63, CNhs65, CNhs654, CNhs788 CNhs629, CNhs755, CNhs776, CNhs779, CNhs3454 CNhs67, CNhs636, CNhs637, CNhs638, CNhs64, CNhs64, CNhs642, CNhs643, CNhs644, CNhs645, CNhs646, CNhs647, CNhs649, CNhs762, CNhs764, CNhs772, CNhs78, CNhs782, CNhs784, CNhs787, CNhs795, CNhs796, CNhs797, CNhs2227, CNhs2228, CNhs2229, CNhs23, CNhs23, CNhs232, CNhs234, CNhs235, CNhs236, CNhs237, CNhs238, CNhs239, CNhs232, CNhs232, CNhs2322, CNhs2323, CNhs2324, CNhs26, CNhs26, CNhs2996, CNhs2997, CNhs3449, CNhs3793, CNhs3794, CNhs3795, CNhs3796, CNhs3797, CNhs3798, CNhs3799, CNhs38, CNhs38, CNhs382, CNhs384, CNhs385, CNhs387, CNhs388, CNhs389 CNhs65, CNhs648, CNhs774, CNhs284 CNhs68, CNhs626, CNhs627, CNhs628, CNhs632, CNhs676, CNhs763, CNhs765, CNhs2846, CNhs2847, CNhs285, CNhs285, CNhs2854, CNhs2998 CNhs625, CNhs635, CNhs68, CNhs766, CNhs77, CNhs786, CNhs2858 S

13 Supplementary Table 9: List of TCGA HCC samples involved in our analysis. Tumor Control AR-A AR-A AT-A AT-A AU-A AU-A AW-A AW-A AX-A AX-A AY-A AY-A AZ-A AZ-A A3-A A3-A A4-A A4-A A6-A A6-A A8-A A8-A A9-A A9-A AA-A AA-A AB-A AB-A AC-A AC-A AD-A AD-A A2J-A A2J-A AEB-A AEB-A AEC-A AEC-A AEE-A AEE-A AEH-A AEH-A AEI-A AEI-A AEJ-A AEJ-A AEL-A AEL-A A26-A A26-A A23B-A A23B-A A26S-A A26S-A A2HT-A A2HT-A A2L6-A A2L6-A A2QR-A A2QR-A A39V-A A39V-A A39W-A A39W-A A39X-A A39X-A A39Z-A A39Z-A A3A-A A3A-A A3A2-A A3A2-A A3A3-A A3A3-A A3EP-A A3EP-A S2

14 Supplementary Table : List of liver-related FANTOM5 samples used in our HCC analysis. Sample Name Sample type Status CNhs624 Liver, adult, pool Tissues Normal CNhs27 Hepatoma cell line:li-7 Cell lines Cancer CNhs742 Hepatoblastoma cell line:huh-6 Cell lines Cancer CNhs798 Liver, fetal, pool Tissues Normal CNhs933 Pleomorphic hepatocellular carcinoma cell line:snu-387 Cell lines Cancer CNhs2328 Hepatocellular carcinoma cell line: HepG2 ENCODE, biol rep Cell lines Cancer CNhs233 Hepatocellular carcinoma cell line: HepG2 ENCODE, biol rep3 Cell lines Cancer CNhs234 Hepatocyte, donor Primary cells Normal CNhs2349 Hepatocyte, donor2 Primary cells Normal CNhs2626 Hepatocyte, donor3 primary cells Normal CNhs2329 Hepatocellular carcinoma cell line: HepG2 ENCODE, biol rep2 Cell lines Cancer Supplementary Table : List of genes with differential expression between HCC tumor and matched control samples, and corresponding inverse differential methylation at their enhancers. Enhancer methylation difference is defined as the average beta value of all the CpG sites at the enhancer in the tumor samples minus the corresponding average beta value in the control samples. Gene expression log fold change is defined as log2 of the ratio between the average expression level in the tumor samples over the average expression level in the control samples. Gene TSS Enhancer region Enhancer methylation Gene expression Difference Corrected p-value Log 2 fold change Corrected p-value CAP2 chr6: chr6: E E-26 DUSP chr5: chr5: E E-4 MUC3 chr3: chr3: E E-32 MUC3 chr3: chr3: E E-32 MYBPHL chr: chr: E E-7 PSRC chr: chr: E E-7 RBM24 chr6: chr6: E E-4 TERT chr5:29562 chr5: E E-26 TERT chr5:29562 chr5: E E-26 Supplementary Table 2: Number of interacting (I) and non-interacting (N) enhancer-tss pairs used in the cross-validation and across-sample tests that involved the FANTOM5 data set. Training Random targets Random contacts Random pairs sample I N I/N I N I/N I N I/N CD4+ T cells 94 9, , ,859. K562 3,33 32,64. 3,33 32,64. 3,33 32,64. MCF , , ,54 Supplementary Table 3: Number of interacting (I) and non-interacting (N) enhancer-tss pairs used in the cross-validation and across-sample tests that involved the ENCODE+Roadmap 48-sample data set. Training Random targets Random contacts Random enhancers Random pairs sample I N I/N I N I/N I N I/N I N I/N CD4+ T cells,36 7,5,36 7,5,36 6,84,36 6,96 GM2878,55 7,33,55 7,35,55 5,949,55 7,22 HepG2 85 5, , , ,662 K562 5,68,795 5,68 9,82 5,68 98,294 5,68 3,87 S3

15 Supplementary Table 4: Number of interacting (I) and non-interacting (N) enhancer-tss pairs used in the cross-validation and across-sample tests that involved the ENCODE+Roadmap 27-sample data set. Training Random targets Random contacts Random enhancers Random pairs sample I N I/N I N I/N I N I/N I N I/N CD4+ T cells,69 7,245,69 7,245,69 7,37,69 7,86 GM2878,43 7,56,43 7,56,43 5,86,43 6,873 HepG , , , ,572 IMR9 22,73 7,476 22,73 7,476 22,73 7,262 22,73 5,77 K562 5,43 99,692 5,43 9,789 5,43 96,34 5,43,937 Supplementary Table 5: Cloning primer list Enhancer Primer Sequence 5-3 PSRC Forward TTTGGATCCTGCTGTCTCCTCCTTTCTCAGGCACA Reverse TTTGGATCCCAATGCTTGCTGATTGAGCTGGAGGA RBM24 Forward TTTGGATCCCAATAATCTATGGCCCTGTGGAGAAA Reverse TTTGGATCCTCTGATCTCAGCTATTGGGTGATGTA TERT Forward TTTGGATCCTCACAGCCAGAAGAACTCTCCCTGCA Reverse TTTGGATCCTATCACAAATGTTAAAATGTAACTCC Supplementary Table 6: CRISPR/Cas9 primer list hpsrc-enh sgrna-#-f-upstream ACACCATTCCACCTGCGTTCCCTCGG hpsrc-enh sgrna-#-r-upstream AAAACCGAGGGAACGCAGGTGGAATG hpsrc-enh sgrna-#8-f-upstream ACACCACAGTGTGAGGCATCCCAAGG hpsrc-enh sgrna-#8-r-upstream AAAACCTTGGGATGCCTCACACTGTG hpsrc-enh sgrna-#-f-downstream ACACCTACAGTGCACACGGTCCGTTG hpsrc-enh sgrna-#-r-downstream AAAACAACGGACCGTGTGCACTGTAG hpsrc-enh sgrna-#3-f-downstream ACACCTCGCGACCTTGAGATTATCAG hpsrc-enh sgrna-#3-r-downstream AAAACTGATAATCTCAAGGTCGCGAG hrbm24-enh sgrna-#-f-upstream ACACCGCTGTGTTGTCTAGGGTGACG hrbm24-enh sgrna-#-r-upstream AAAACGTCACCCTAGACAACACAGCG hrbm24-enh sgrna-#2-f-upstream ACACCATGGAGCTCACTGTTAGGCAG hrbm24-enh sgrna-#2-r-upstream AAAACTGCCTAACAGTGAGCTCCATG hrbm24-enh sgrna-#2-f-downstream ACACCTTGACCTTACTGAACTAGGCG hrbm24-enh sgrna-#2-r-downstream AAAACGCCTAGTTCAGTAAGGTCAAG hrbm24-enh sgrna-#9-f-downstream ACACCCAAAAACACACTTTGACCTGG hrbm24-enh sgrna-#9-r-downstream AAAACCAGGTCAAAGTGTGTTTTTGG htert-enh sgrna-#-f-upstream ACACCACATCTGCCGGGGGCGCTGAG htert-enh sgrna-#-r-upstream AAAACTCAGCGCCCCCGGCAGATGTG htert-enh sgrna-#7-f-upstream ACACCAACCTTGCTGAAGGGGTGCCG htert-enh sgrna-#7-r-upstream AAAACGGCACCCCTTCAGCAAGGTTG htert-enh sgrna-#2-f-downstream ACACCCACGCTGGCTCCAATAAACCG htert-enh sgrna-#2-r-downstream AAAACGGTTTATTGGAGCCAGCGTGG htert-enh sgrna-#7-f-downstream ACACCTCCCCCCAGCAGCTGTCTGCG htert-enh sgrna-#7-r-downstream AAAACGCAGACAGCTGCTGGGGGGAG PRSC-F CCAGGATGAAGACAGGAAATGCTTGG PRSC-R CTGAGGTGCAGTCTAAGACAGGCAC RBM24-F TTTGAGCAGCAGCTTGAACGACAGC RBM24-R TGACCTGATAACAAGCCCAGTGTGC TERT-F ACTTCAAACGGGCTTCAAGGGAGGC TERT-R TAGCAATCACGCCGGAGACGTTTCT S4

16 Supplementary Table 7: qrt-pcr primer list Gene Primer Sequence 5-3 PSRC Forward CTCGAAAAGGGCTTCCAAGACC Reverse TGACAGGAAGATTTAGTCGCTGG RBM24 Forward GTGAACCTGGCATACTTAGGAGC Reverse GCACAAAAGCCTGCGGATAGAC TERT Forward GCCGATTGTGAACATGGACTACG Reverse GCTCGTAGTTGAGCACGCTGAA Supplementary Table 8: Bisulfite sequencing primer list Functional element Primer Sequence 5-3 PSRC enhancer # Forward TTGAGTGTAGGTTTGGAGATTGATTA Reverse CTTCCAACTACCAAAAATAAACTATAAC PSRC enhancer #2 Forward AGTTGGAAGAATTTTATTTTTGGTATATTT Reverse ACCCAAATAAAAACATAAACCACAC PSRC promoter Forward GGTTAGTAGTTAGATTTAGGGTGTAAG Reverse AAATAACTTAAAAAATACAAAACCCTA RBM24 enhancer Forward GATATGTTGTTGTTTGGAGAAGATAG Reverse ATTTTTAACCAAACTTTAACCTTACTA RBM24 promoter Forward TAGAGTAGGGGTGAAAGGAGAGA Reverse AAACAACAACCCCAAACCTC TERT enhancer Forward GGGGATTTAGGGATTTATAGTTAGAAGAA Reverse TAAACACCCTCCCCCCAACA TERT promoter Forward ATGATGTGGAGGTTTTGGGAATA Reverse TAACTCCATTTCCCACCCTTTCT S5

17 AUC AUC AUC Supplementary figures. Highest Medium-high Medium-low Lowest Mean of four classes Highest Medium-high Medium-low Lowest Mean of four classes. E P B E+P E+B P+B E+P+B (a) (b). B/P E+B/P (c) Supplementary Figure : Classification accuracy of the statistical models when each enhancer-target pair is considered separately, based on FANTOM5 enhancers and ChIA-PET-defined active and inactive enhancer-target pairs in K562. Each bar represents the one-against-all accuracy of one expression level class, or the average of these values from the four classes. (a) Accuracy of the models involving all enhancer features together, or one type of features at a time. (b) Accuracy of the models based on enhancer features (E), promoter features (P), gene body features (B), or their combinations. (c) Accuracy of the models involving single promoter (P) or gene body (B) features, or all enhancer features plus single promoter (E+P) or gene body (E+B) features. In all panels, error bars show the standard deviations in five repeated rounds with different random training and testing sets. S6

18 AUC AUC AUC AUC. Highest Medium-high Medium-low Lowest Mean of four classes. Highest Medium-high Medium-low Lowest Mean of four classes (a) (b). Highest Medium-high Medium-low Lowest Mean of four classes. Highest Medium-high Medium-low Lowest Mean of four classes (c) (d) Supplementary Figure 2: Classification accuracy of the statistical models when each enhancer-target pair is considered separately, involving all enhancer features together, or one type of features at a time. Each bar represents the one-against-all accuracy of one expression level class, or the average of these values from the four classes. The active enhancers were predicted (a) by ChromHMM in K562, (b) by FANTOM5 in K562, (c) by CSI-ANN in MCF-7 and (d) by FANTOM5 in MCF-7. Error bars show the standard deviations in five repeated rounds with different random training and testing sets. S7

19 AUC AUC AUC AUC Highest Medium-high Medium-low Lowest Mean of four classes Highest Medium-high Medium-low Lowest Mean of four classes.. E P B E+P E+B P+B E+P+B E P B E+P E+B P+B E+P+B (a) (b) Highest Medium-high Medium-low Lowest Mean of four classes Highest Medium-high Medium-low Lowest Mean of four classes.. E P B E+P E+B P+B E+P+B E P B E+P E+B P+B E+P+B (c) (d) Supplementary Figure 3: Classification accuracy of the statistical models when each enhancer-target pair is considered separately, involving enhancer features (E), promoter features (P), gene body features (B), or their combinations. Each bar represents the one-against-all accuracy of one expression level class, or the average of these values from the four classes. The active enhancers were predicted (a) by ChromHMM in K562, (b) by FANTOM5 in K562, (c) by CSI-ANN in MCF-7 and (d) by FANTOM5 in MCF-7. Error bars show the standard deviations in five repeated rounds with different random training and testing sets. S8

20 Median percentage error Median percentage error Median percentage error Median percentage error 7% 7% 6% 6% 5% 5% 4% 4% 3% 3% 2% 2% % % % % (a) (b) 7% 7% 6% 6% 5% 5% 4% 4% 3% 3% 2% 2% % % % All enhancer H3K27ac H3K27me3 Enhancer DNase I DNA % All enhancer H3K27ac H3K27me3 Enhancer DNase I DNA features RNA methylation features RNA methylation (c) (d) Supplementary Figure 4: Regression errors of the linear regression models when each enhancer-target pair is considered separately, involving all enhancer features together, or one type of features at a time. The active enhancers were predicted (a) by ChromHMM in K562, (b) by FANTOM5 in K562, (c) by CSI-ANN in MCF-7 and (d) by FANTOM5 in MCF-7. Error bars show the standard deviations in ten repeated rounds with different random training and testing sets. S9

21 Median percentage error Median percentage error Median percentage error Median percentage error 7% 7% 6% 6% 5% 5% 4% 4% 3% 3% 2% 2% % % % E P B E+P E+B P+B E+P+B % E P B E+P E+B P+B E+P+B (a) (b) 7% 7% 6% 6% 5% 5% 4% 4% 3% 3% 2% 2% % % % E P B E+P E+B P+B E+P+B % E P B E+P E+B P+B E+P+B (c) (d) Supplementary Figure 5: Regression errors of the linear regression models when each enhancer-target pair is considered separately, involving enhancer features (E), promoter features (P), gene body features (B), or their combinations. The active enhancers were predicted (a) by ChromHMM in K562, (b) by FANTOM5 in K562, (c) by CSI-ANN in MCF-7 and (d) by FANTOM5 in MCF-7. Error bars show the standard deviations in ten repeated rounds with different random training and testing sets. S2

22 Median percentage error Considering each enhancer independently Considering multiple enhancers jointly 3% 25% 2% 5% % 5% % ENCODE+Roadmap ENCODE+Roadmap ENCODE+Roadmap ENCODE+Roadmap FANTOM5 27, imp-imp 48, imp-imp 48, imp-non 48, non-non Supplementary Figure 6: Regression errors of the statistical models with and without considering multiple enhancers potentially regulating a TSS at the same time. Each bar group corresponds to the results based on one data set. Each data set from ENCODE+Roadmap is marked by ) the number of samples involved, 2) whether the enhancer features involve (imp) or not involve (non) data imputation, and 2) whether the expression data involve or not involve data imputation. For example, Roadmap 48, imp-non is the data set involving 48 samples, where the enhancer features involve some imputed data while the expression data do not involve imputation. Error bars show the standard deviations in ten repeated rounds with different random training and testing sets. S2

23 AUPR AUROC AUPR AUROC AUPR AUROC AUPR AUROC.. CD4+ T cells CD4+ T cells GM2878 eqtl GM2878 eqtl HepG2 eqtl HepG2 eqtl K562 ChIA-PET CD4+ T cells CD4+ T cells GM2878 eqtl GM2878 eqtl HepG2 eqtl HepG2 eqtl K562 ChIA-PET ChIA-PET ChIA-PET ChIA-PET ChIA-PET (a) Random targets.. CD4+ T cells CD4+ T cells GM2878 eqtl GM2878 eqtl HepG2 eqtl HepG2 eqtl K562 ChIA-PET CD4+ T cells CD4+ T cells GM2878 eqtl GM2878 eqtl HepG2 eqtl HepG2 eqtl K562 ChIA-PET ChIA-PET ChIA-PET ChIA-PET ChIA-PET (b) Random contacts.. CD4+ T cells CD4+ T cells GM2878 eqtl GM2878 eqtl HepG2 eqtl HepG2 eqtl K562 ChIA-PET CD4+ T cells CD4+ T cells GM2878 eqtl GM2878 eqtl HepG2 eqtl HepG2 eqtl K562 ChIA-PET ChIA-PET ChIA-PET ChIA-PET ChIA-PET (c) Random enhancers.2. CD4+ T cells CD4+ T cells GM2878 eqtl GM2878 eqtl HepG2 eqtl HepG2 eqtl K562 ChIA-PET CD4+ T cells CD4+ T cells GM2878 eqtl GM2878 eqtl HepG2 eqtl HepG2 eqtl K562 ChIA-PET ChIA-PET ChIA-PET ChIA-PET ChIA-PET (d) Random pairs Supplementary Figure 7: Consistency of the enhancer-target pairs inferred by non-imputed data from 48 ENCODE+Roadmap samples, using ChIA-PET and eqtl data as reference. The four panels correspond to four different ways to draw background pairs, namely (a) random targets, (b) random contacts, (c) random enhancers and (d) random pairs. Each panel consists of two sub-figures that correspond to two different consistency measures. Distance only shows the results when the genomic distance between an enhancer and a TSS was used as the only feature to train a Random Forest model for predicting their interaction; Nearest gene shows the results when each enhancer was predicted to regulate the nearest gene only; Background shows the expected result for a purely random predictor. stands for cross-validation in the specified sample, while stands for across-sample prediction, where the second part of JEME was trained in K562 and the final model was tested on the specified sample. Error bars show the standard deviations of repeated runs based on different random training and testing sets. S22

24 AUPR AUROC AUPR AUROC AUPR AUROC AUPR AUROC.2. CD4+ T cells CD4+ T cells GM2878 GM2878 HepG2 eqtl HepG2 eqtl IMR9 Hi-C IMR9 Hi-C K562 ChIA- CD4+ T cells CD4+ T cells GM2878 GM2878 HepG2 eqtl HepG2 eqtl IMR9 Hi-C IMR9 Hi-C K562 ChIA- ChIA-PET ChIA-PET eqtl eqtl PET ChIA-PET ChIA-PET eqtl eqtl PET (a) Random targets.. CD4+ T cells CD4+ T cells GM2878 GM2878 HepG2 eqtl HepG2 eqtl IMR9 Hi-C IMR9 Hi-C K562 ChIA- CD4+ T cells CD4+ T cells GM2878 GM2878 HepG2 eqtl HepG2 eqtl IMR9 Hi-C IMR9 Hi-C K562 ChIA- ChIA-PET ChIA-PET eqtl eqtl PET ChIA-PET ChIA-PET eqtl eqtl PET (b) Random contacts.. CD4+ T cells CD4+ T cells GM2878 GM2878 HepG2 eqtl HepG2 eqtl IMR9 Hi-C IMR9 Hi-C K562 ChIA- CD4+ T cells CD4+ T cells GM2878 GM2878 HepG2 eqtl HepG2 eqtl IMR9 Hi-C IMR9 Hi-C K562 ChIA- ChIA-PET ChIA-PET eqtl eqtl PET ChIA-PET ChIA-PET eqtl eqtl PET (c) Random enhancers.2. CD4+ T cells CD4+ T cells GM2878 GM2878 HepG2 eqtl HepG2 eqtl IMR9 Hi-C IMR9 Hi-C K562 ChIA- CD4+ T cells CD4+ T cells GM2878 GM2878 HepG2 eqtl HepG2 eqtl IMR9 Hi-C IMR9 Hi-C K562 ChIA- ChIA-PET ChIA-PET eqtl eqtl PET ChIA-PET ChIA-PET eqtl eqtl PET (d) Random pairs Supplementary Figure 8: Consistency of the enhancer-target pairs inferred by imputed data from 27 ENCODE+Roadmap samples, using ChIA-PET, eqtl and Hi-C data as reference. The four panels correspond to four different ways to draw background pairs, namely (a) random targets, (b) random contacts, (c) random enhancers and (d) random pairs. Each panel consists of two sub-figures that correspond to two different consistency measures. Distance only shows the results when the genomic distance between an enhancer and a TSS was used as the only feature to train a Random Forest model for predicting their interaction; Nearest gene shows the results when each enhancer was predicted to regulate the nearest gene only; Background shows the expected result for a purely random predictor. stands for cross-validation in the specified sample, while stands for across-sample prediction, where the second part of JEME was trained in K562 and the final model was tested on the specified sample. Error bars show the standard deviations of repeated runs based on different random training and testing sets. S23

25 AUPR AUROC AUPR AUROC AUPR AUROC.2. CD4+ T cells ChIA-PET CD4+ T cells ChIA-PET K562 ChIA-PET MCF-7 ChIA-PET MCF-7 ChIA-PET CD4+ T cells ChIA-PET CD4+ T cells ChIA-PET K562 ChIA-PET MCF-7 ChIA-PET MCF-7 ChIA-PET (a) Random targets.2. CD4+ T cells ChIA-PET CD4+ T cells ChIA-PET K562 ChIA-PET MCF-7 ChIA-PET MCF-7 ChIA-PET CD4+ T cells ChIA-PET CD4+ T cells ChIA-PET K562 ChIA-PET MCF-7 ChIA-PET MCF-7 ChIA-PET (b) Random contacts.2. CD4+ T cells ChIA-PET CD4+ T cells ChIA-PET K562 ChIA-PET MCF-7 ChIA-PET MCF-7 ChIA-PET CD4+ T cells ChIA-PET CD4+ T cells ChIA-PET K562 ChIA-PET MCF-7 ChIA-PET MCF-7 ChIA-PET (c) Random pairs Supplementary Figure 9: Consistency of the enhancer-target pairs inferred from FANTOM5 samples, using ChIA-PET data as reference. The three panels correspond to three different ways to draw background pairs applicable to FANTOM5 data, namely (a) random targets, (b) random contacts and (c) random pairs. As explained in Supplementary Methods, due to the desired positive-to-background ratios, the background sets for random enhancers could not be formed. Each panel consists of two sub-figures that correspond to two different consistency measures. Distance only shows the results when the genomic distance between an enhancer and a TSS was used as the only feature to train a Random Forest model for predicting their interaction; Nearest gene shows the results when each enhancer was predicted to regulate the nearest gene only; Background shows the expected result for a purely random predictor. stands for cross-validation in the specified sample, while stands for across-sample prediction, where the second part of JEME was trained in K562 and the final model was tested on the specified sample. Error bars show the standard deviations of repeated runs based on different random training and testing sets. S24

26 Relative AUPR.2 JEME (4 data types) TargetFinder (4 data types) TargetFinder (8 data types) TargetFinder (6 data types) TargetFinder (all data types) Cross-validation Across-sample Supplementary Figure : Relative AUPR of JEME and TargetFinder based on cross-validation results in GM2878 and across-sample results trained in K562 and tested in GM2878. Relative AUPR of each method is defined as its AUPR divided by the AUPR of JEME. The number of data types involved in predicting enhancer targets (not including the data for predicting the enhancers) is shown. S25

27 Number of enhancers Number of samples Number of samples >48 Number of samples Number of samples Number of samples 35 (a) (b) (c) 3 Regulated TSSs 4 Regulating enhancers Active enhancer-target pairs Regulating enhancers per TSS (d) (e) (f) 45 Target TSSs per enhancer Fraction of enhancers regulating closest TSSs Fraction of enhancer-target pairs involving closest TSS Number of samples in which an enhancer regulates a TSS Median distance (kb). Supplementary Figure : Basic properties of enhancer-target networks. (a) Number of TSSs regulated by at least one enhancer, number of enhancers regulating at least one target TSS, and number of enhancertarget pairs in each sample. (b) Average number of regulating enhancers per TSS and (c) average number of target TSSs per enhancer in each sample. (d) Number of samples in which each enhancer regulates a TSS. (e) Median distance between an enhancer and a target TSS that it regulates in each sample. (f) Fraction of enhancers regulating the closest TSS and fraction of enhancer-target pairs involving an enhancer regulating the closest TSS in each sample. S26

28 Preadipocyte perirenal donor Chondrocyte re diff donor3 Fibroblast Gingival donor3 Neural stem cells donor2 chorionic membrane cells donor Blood vessel endothelial cell T cell Mesenchymal cell Vascular associated smooth muscle cell Monocyte Preadipocyte Respiratory epithelial cell Kidney epithelial cell Skin fibroblast Mesenchymal stem cells adipose donor Mesothelial Cells donor Smooth Muscle Cells Aortic donor Smooth Muscle Cells Umbilical artery donor Endothelial Cells Aortic donor Endothelial Cells Umbilical vein donor Mesenchymal Stem Cells Whartons Jelly donor Smooth Muscle Cells Tracheal donor Fibroblast Aortic Adventitial donor Fibroblast Pulmonary Artery donor Preadipocyte visceral donor Alveolar Epithelial Cells donor Renal Cortical Epithelial Cells donor Renal Proximal Tubular Epithelial Cell donor Renal Epithelial Cells donor Osteoblast differentiated donor Renal Mesangial Cells donor Fibroblast Lymphatic donor Adipocyte subcutaneous donor2 Preadipocyte subcutaneous donor2 Fibroblast Aortic Adventitial donor3 Fibroblast Dermal donor2 Chondrocyte de diff donor2 Hair Follicle Dermal Papilla Cells donor2 Smooth Muscle Cells Aortic donor2 Fibroblast Cardiac donor2 Smooth Muscle Cells Internal Thoracic Artery donor2 Smooth Muscle Cells Subclavian Artery donor2 Smooth Muscle Cells Uterine donor3 Smooth Muscle Cells Aortic donor3 Smooth Muscle Cells Aortic donor Smooth Muscle Cells Carotid donor Smooth Muscle Cells Subclavian Artery donor Smooth Muscle Cells Coronary Artery donor Smooth Muscle Cells Brachiocephalic donor Smooth Muscle Cells Umbilical Artery donor Adipocyte breast donor Preadipocyte breast donor Adipocyte omental donor Multipotent Cord Blood Unrestricted Somatic Stem Cells donor Endothelial Cells Vein donor2 Endothelial Cells Aortic donor2 Endothelial Cells Umbilical vein donor2 Endothelial Cells Thoracic donor Endothelial Cells Lymphatic donor3 Endothelial Cells Umbilical vein donor3 Mesothelial Cells donor3 Fibroblast Dermal donor3 Preadipocyte visceral donor3 Preadipocyte subcutaneous donor3 Renal Proximal Tubular Epithelial Cell donor2 Smooth Muscle Cells Umbilical Artery donor2 Fibroblast Cardiac donor4 Preadipocyte breast donor2 Smooth Muscle Cells Brain Vascular donor2 Preadipocyte omental donor Fibroblast Periodontal Ligament donor5 (PL3) Endothelial Cells Lymphatic donor Endothelial Cells Microvascular donor2 Endothelial Cells Lymphatic donor2 Endothelial Cells Microvascular donor Endothelial Cells Artery donor2 Endothelial Cells Artery donor3 Mesenchymal Stem Cells bone marrow donor2 Renal Epithelial Cells donor2 Alveolar Epithelial Cells donor2 Hair Follicle Dermal Papilla Cells donor3 Smooth Muscle Cells Coronary Artery donor3 Osteoblast donor3 Fibroblast Cardiac donor3 Smooth Muscle Cells Internal Thoracic Artery donor3 Hepatic Stellate Cells (lipocyte) donor2 Smooth Muscle Cells Bronchial donor2 Smooth Muscle Cells Subclavian Artery donor3 Meningeal Cells donor2 Smooth Muscle Cells Umbilical Vein donor2 Smooth Muscle Cells Umbilical Vein donor Cardiac Myocyte donor3 mesenchymal precursor cell bone marrow donor3 Fibroblast Dermal donor mesenchymal precursor cell adipose donor3 mesenchymal precursor cell bone marrow donor2 mesenchymal precursor cell cardiac donor mesenchymal precursor cell cardiac donor3 mesenchymal precursor cell cardiac donor4 mesenchymal precursor cell adipose donor mesenchymal precursor cell adipose donor2 mesenchymal precursor cell bone marrow donor mesenchymal precursor cell cardiac donor2 Fibroblast Cardiac donor Renal Cortical Epithelial Cells donor2 Renal Epithelial Cells donor3 Renal Proximal Tubular Epithelial Cell donor3 Renal Mesangial Cells donor3 tenocyte donor2 tenocyte donor tenocyte donor3 Smooth Muscle Cells Carotid donor3 Smooth Muscle Cells Brachiocephalic donor3 Smooth Muscle Cells Umbilical Artery donor3 Smooth Muscle Cells Brain Vascular donor3 Preadipocyte omental donor3 Endothelial Cells Microvascular donor3 Renal Glomerular Endothelial Cells donor2 Endothelial Cells Aortic donor3 Endothelial Cells Vein donor3 Keratocytes donor2 Lens Epithelial Cells donor3 Fibroblast Periodontal Ligament donor4 (PL29) Adipocyte omental donor3 Adipocyte subcutaneous donor3 Chondrocyte de diff donor3 Prostate Stromal Cells donor2 Astrocyte cerebral cortex donor Mesenchymal Stem Cells umbilical donor Retinal Pigment Epithelial Cells donor Renal Glomerular Endothelial Cells donor Endothelial Cells Thoracic donor2 Astrocyte cerebral cortex donor2 Ciliary Epithelial Cells donor2 Ciliary Epithelial Cells donor Smooth Muscle Cells Pulmonary Artery donor2 Myoblast donor Mesenchymal Stem Cells adipose donor Prostate Stromal Cells donor Mesenchymal stem cells hepatic donor Retinal Pigment Epithelial Cells donor Fibroblast Conjunctival donor Hepatic Sinusoidal Endothelial Cells donor Pericytes donor2 Endothelial Cells Aortic donor Endothelial Cells Artery donor Endothelial Cells Vein donor Olfactory epithelial cells donor4 Olfactory epithelial cells donor Hepatic Sinusoidal Endothelial Cells donor2 Renal Glomerular Endothelial Cells donor3 Renal Glomerular Endothelial Cells donor4 Olfactory epithelial cells donor2 Olfactory epithelial cells donor3 Lens Epithelial Cells donor2 Astrocyte cerebral cortex donor3 Retinal Pigment Epithelial Cells donor3 Myoblast donor2 Ciliary Epithelial Cells donor3 Iris Pigment Epithelial Cells donor Myoblast donor3 Smooth Muscle Cells Colonic donor2 Mesenchymal Stem Cells amniotic membrane donor Osteoblast differentiated donor3 chorionic membrane cells donor2 Mesenchymal Stem Cells hepatic donor2 Smooth Muscle Cells Colonic donor3 Smooth Muscle Cells Prostate donor Fibroblast Gingival donor5 (GFH3) Smooth Muscle Cells Prostate donor2 Hair Follicle Outer Root Sheath Cells donor Hair Follicle Outer Root Sheath Cells donor2 Smooth Muscle Cells Tracheal donor3 Prostate Stromal Cells donor3 Pancreatic stromal cells donor Fibroblast Periodontal Ligament donor6 (PLH3) Mesenchymal stem cells umbilical donor Fibroblast Choroid Plexus donor2 Cardiac Myocyte donor2 Fibroblast skin walker warburg donor Fibroblast skin normal donor2 Fibroblast skin spinal muscular atrophy donor3 Adipocyte omental donor2 Preadipocyte omental donor2 Fibroblast skin normal donor Chondrocyte de diff donor Fibroblast skin spinal muscular atrophy donor2 Synoviocyte donor3 Fibroblast skin dystrophia myotonica donor Fibroblast skin spinal muscular atrophy donor Synoviocyte donor2 Fibroblast skin dystrophia myotonica donor3 Adipocyte perirenal donor Fibroblast skin dystrophia myotonica donor2 Fibroblast Periodontal Ligament donor3 Fibroblast Gingival donor Chondrocyte re diff donor2 Preadipocyte visceral donor2 Fibroblast Cardiac donor5 Mesenchymal Stem Cells amniotic membrane donor2 Fibroblast Cardiac donor6 Meningeal Cells donor Smooth Muscle Cells Coronary Artery donor2 Meningeal Cells donor3 Multipotent Cord Blood Unrestricted Somatic Stem Cells donor2 Fibroblast Dermal donor4 Hair Follicle Dermal Papilla Cells donor Fibroblast Dermal donor5 Fibroblast Gingival donor4 (GFH2) Fibroblast Dermal donor6 Mesenchymal Stem Cells adipose donor3 Osteoblast differentiated donor2 Osteoblast donor2 Fibroblast Periodontal Ligament donor Fibroblast Gingival donor2 Lens Epithelial Cells donor Adipocyte subcutaneous donor Mesenchymal Stem Cells bone marrow donor Fibroblast Periodontal Ligament donor2 Cardiac Myocyte donor Reticulocytes biol rep2 Reticulocytes biol rep Sertoli Cells donor Skeletal Muscle Satellite Cells donor3 Skeletal Muscle Cells donor6 Skeletal Muscle Satellite Cells donor Skeletal Muscle Cells donor4 Mesenchymal Stem Cells bone marrow donor3 nasal epithelial cells donor2 amniotic membrane cells donor2 Pericytes donor Fibroblast Choroid Plexus donor Skeletal Muscle Cells donor Trabecular Meshwork Cells donor Skeletal Muscle Satellite Cells donor2 Skeletal Muscle Cells donor5 CD4+CD6 Monocytes donor CD4+CD6 Monocytes donor2 CD4+CD6 Monocytes donor3 CD4+CD6+ Monocytes donor3 CD4+CD6+ Monocytes donor CD4+CD6+ Monocytes donor2 CD4+ Monocytes donor3 CD4+ Monocytes donor2 CD4+ Monocytes donor Dendritic Cells plasmacytoid donor Neutrophils donor Smooth Muscle Cells Brain Vascular donor Smooth Muscle Cells Colonic donor Astrocyte cerebellum donor Neutrophils donor3 Skeletal muscle cells differentiated into Myotubes multinucleated donor Trabecular Meshwork Cells donor3 CD4 CD6+ Monocytes donor3 CD4 CD6+ Monocytes donor2 CD4+ monocytes mock treated donor3 CD4+ monocytes mock treated donor CD4+ monocytes mock treated donor2 Mast cell stimulated donor Melanocyte light donor Intestinal epithelial cells (polarized) donor CD4+ monocyte derived endothelial progenitor cells donor Prostate Epithelial Cells (polarized) donor Smooth Muscle Cells Bronchial donor Smooth Muscle Cells Esophageal donor Hepatic Stellate Cells (lipocyte) donor Keratocytes donor Astrocyte cerebellum donor2 Astrocyte cerebellum donor3 Basophils donor3 Neutrophils donor2 Fibroblast Lymphatic donor3 migratory langerhans cells donor3 migratory langerhans cells donor2 migratory langerhans cells donor immature langerhans cells donor Macrophage monocyte derived donor3 Melanocyte light donor2 Melanocyte light donor3 Neurons donor2 immature langerhans cells donor2 Mast cell donor2 Mast cell donor4 Mast cell donor3 Mast cell donor Macrophage monocyte derived donor2 CD4+ monocyte derived endothelial progenitor cells donor3 CD4+ monocyte derived endothelial progenitor cells donor2 Macrophage monocyte derived donor Dendritic Cells monocyte immature derived donor tech rep2 Dendritic Cells monocyte immature derived donor tech rep Dendritic Cells monocyte immature derived donor3 Natural Killer Cells donor CD8+ T Cells donor Urothelial cells donor Sebocyte donor CD4+ T Cells donor Neural stem cells donor Keratinocyte epidermal donor Gingival epithelial cells donor (GEA) Keratinocyte oral donor Small Airway Epithelial Cells donor Mammary Epithelial Cell donor Placental Epithelial Cells donor Urothelial Cells donor Tracheal Epithelial Cells donor Corneal Epithelial Cells donor Esophageal Epithelial Cells donor Sebocyte donor2 Keratinocyte epidermal donor2 Mammary Epithelial Cell donor2 Amniotic Epithelial Cells donor Placental Epithelial Cells donor2 Urothelial Cells donor2 Mesenchymal Stem Cells umbilical donor3 Natural Killer Cells donor3 Natural Killer Cells donor2 CD8+ T Cells donor2 CD8+ T Cells donor3 Placental Epithelial Cells donor3 CD4+ T Cells donor2 CD4+ T Cells donor3 Prostate Epithelial Cells donor2 Gingival epithelial cells donor3 (GEA5) Gingival epithelial cells donor2 (GEA4) Tracheal Epithelial Cells donor3 Tracheal Epithelial Cells donor2 Bronchial Epithelial Cell donor5 Bronchial Epithelial Cell donor4 Mammary Epithelial Cell donor3 Bronchial Epithelial Cell donor6 Keratinocyte epidermal donor3 Small Airway Epithelial Cells donor3 Prostate Epithelial Cells donor3 Small Airway Epithelial Cells donor2 Corneal Epithelial Cells donor3 Urothelial Cells donor3 Amniotic Epithelial Cells donor3 amniotic membrane cells donor amniotic membrane cells donor3 chorionic membrane cells donor chorionic membrane cells donor3 Melanocyte dark donor3 CD4+CD25 CD45RA+ naive conventional T cells donor CD4+CD25 CD45RA+ naive conventional T cells donor2 CD4+CD25 CD45RA+ naive conventional T cells donor3 CD4+CD25+CD45RA memory regulatory T cells donor2 CD4+CD25+CD45RA memory regulatory T cells donor CD4+CD25+CD45RA memory regulatory T cells donor3 CD4+CD25 CD45RA memory conventional T cells donor3 CD4+CD25+CD45RA+ naive regulatory T cells donor3 salivary acinar cells donor CD9+ B Cells donor3 CD9+ B Cells donor CD9+ B Cells donor2 Hepatocyte donor3 Hepatocyte donor Hepatocyte donor2 Neurons donor nasal epithelial cells donor tech rep salivary acinar cells donor2 salivary acinar cells donor3 CD4+CD25 CD45RA+ naive conventional T cells expanded donor3 CD4+CD25 CD45RA+ naive conventional T cells expanded donor2 CD4+CD25 CD45RA memory conventional T cells expanded donor CD4+CD25 CD45RA+ naive conventional T cells expanded donor CD4+CD25+CD45RA memory regulatory T cells expanded donor3 CD4+CD25+CD45RA memory regulatory T cells expanded donor2 CD4+CD25+CD45RA memory regulatory T cells expanded donor CD4+CD25+CD45RA+ naive regulatory T cells expanded donor Neurons donor3 Mallassez derived cells donor3 Mallassez derived cells donor2 Endothelial Cells Vein donor2 Endothelial Cells Aortic donor2 Endothelial Cells Aortic donor Endothelial Cells Thoracic donor Endothelial Cells Umbilical vein donor2 Endothelial Cells Umbilical vein donor Endothelial Cells Umbilical vein donor3 Small Airway Epithelial Cells donor Small Airway Epithelial Cells donor2 Small Airway Epithelial Cells donor3 Tracheal Epithelial Cells donor2 Tracheal Epithelial Cells donor Alveolar Epithelial Cells donor Renal Cortical Epithelial Cells donor Renal Proximal Tubular Epithelial Cell donor Renal Epithelial Cells donor Renal Mesangial Cells donor Endothelial Cells Microvascular donor2 Endothelial Cells Microvascular donor Endothelial Cells Aortic donor3 Endothelial Cells Artery donor2 Endothelial Cells Artery donor3 Smooth Muscle Cells Brain Vascular donor2 Smooth Muscle Cells Brain Vascular donor Mesenchymal Stem Cells umbilical donor Endothelial Cells Thoracic donor2 Endothelial Cells Microvascular donor3 Renal Glomerular Endothelial Cells donor2 Endothelial Cells Vein donor3 Tracheal Epithelial Cells donor3 Bronchial Epithelial Cell donor6 Bronchial Epithelial Cell donor4 Bronchial Epithelial Cell donor5 Renal Proximal Tubular Epithelial Cell donor2 Renal Epithelial Cells donor2 Alveolar Epithelial Cells donor2 Renal Proximal Tubular Epithelial Cell donor3 Renal Mesangial Cells donor3 Mesenchymal Stem Cells bone marrow donor2 Smooth Muscle Cells Brain Vascular donor3 Mesenchymal Stem Cells umbilical donor3 Mesenchymal Stem Cells Whartons Jelly donor Renal Glomerular Endothelial Cells donor Endothelial Cells Aortic donor Endothelial Cells Artery donor Endothelial Cells Vein donor Renal Cortical Epithelial Cells donor2 Renal Epithelial Cells donor3 Fibroblast Dermal donor3 Smooth Muscle Cells Umbilical Vein donor2 Renal Glomerular Endothelial Cells donor3 Mesenchymal stem cells adipose donor Smooth Muscle Cells Bronchial donor2 Renal Glomerular Endothelial Cells donor4 mesenchymal precursor cell bone marrow donor3 Fibroblast Dermal donor Smooth Muscle Cells Umbilical Vein donor Fibroblast Dermal donor2 Preadipocyte visceral donor Smooth Muscle Cells Pulmonary Artery donor2 Preadipocyte omental donor2 Preadipocyte visceral donor2 Smooth Muscle Cells Subclavian Artery donor2 Preadipocyte perirenal donor Preadipocyte visceral donor3 Preadipocyte subcutaneous donor2 Mesenchymal stem cells hepatic donor Mesenchymal Stem Cells hepatic donor2 Preadipocyte subcutaneous donor3 mesenchymal precursor cell adipose donor3 Smooth Muscle Cells Coronary Artery donor2 Mesenchymal Stem Cells adipose donor Mesenchymal stem cells umbilical donor mesenchymal precursor cell cardiac donor mesenchymal precursor cell bone marrow donor2 mesenchymal precursor cell cardiac donor3 mesenchymal precursor cell cardiac donor4 amniotic membrane cells donor mesenchymal precursor cell adipose donor mesenchymal precursor cell adipose donor2 amniotic membrane cells donor3 mesenchymal precursor cell bone marrow donor mesenchymal precursor cell cardiac donor2 chorionic membrane cells donor3 Multipotent Cord Blood Unrestricted Somatic Stem Cells donor amniotic membrane cells donor2 Fibroblast Dermal donor4 Fibroblast Dermal donor5 Mesenchymal Stem Cells adipose donor3 CD4+CD6 Monocytes donor Fibroblast Dermal donor6 CD4+CD6 Monocytes donor2 CD4+CD6 Monocytes donor3 Mesenchymal Stem Cells bone marrow donor CD4+CD6+ Monocytes donor3 Preadipocyte breast donor2 Preadipocyte omental donor Preadipocyte breast donor Preadipocyte omental donor3 Mesenchymal Stem Cells bone marrow donor3 CD4+CD6+ Monocytes donor CD4+CD6+ Monocytes donor2 CD4 CD6+ Monocytes donor3 CD4 CD6+ Monocytes donor2 CD4+ monocytes mock treated donor3 CD4+ monocytes mock treated donor CD4+ Monocytes donor CD4+ monocyte derived endothelial progenitor cells donor Fibroblast skin spinal muscular atrophy donor Fibroblast skin normal donor Fibroblast skin walker warburg donor CD4+ monocytes mock treated donor2 CD4+ Monocytes donor3 CD4+ monocyte derived endothelial progenitor cells donor3 CD4+ monocyte derived endothelial progenitor cells donor2 CD4+ T Cells donor CD8+ T Cells donor Smooth Muscle Cells Umbilical Artery donor Smooth Muscle Cells Brachiocephalic donor Smooth Muscle Cells Coronary Artery donor Smooth Muscle Cells Umbilical artery donor Smooth Muscle Cells Subclavian Artery donor Smooth Muscle Cells Carotid donor Smooth Muscle Cells Aortic donor Smooth Muscle Cells Aortic donor Smooth Muscle Cells Aortic donor3 Smooth Muscle Cells Aortic donor2 Mesenchymal Stem Cells amniotic membrane donor Fibroblast skin dystrophia myotonica donor2 Fibroblast skin dystrophia myotonica donor Fibroblast skin spinal muscular atrophy donor2 Fibroblast skin normal donor2 CD4+ Monocytes donor2 Fibroblast skin spinal muscular atrophy donor3 CD4+CD25 CD45RA+ naive conventional T cells donor CD4+CD25 CD45RA+ naive conventional T cells donor2 CD4+CD25 CD45RA+ naive conventional T cells donor3 Fibroblast skin dystrophia myotonica donor3 CD4+CD25+CD45RA+ naive regulatory T cells donor3 CD4+CD25+CD45RA memory regulatory T cells donor2 CD4+CD25+CD45RA memory regulatory T cells donor CD4+CD25+CD45RA memory regulatory T cells donor3 CD4+ T Cells donor2 CD4+ T Cells donor3 CD8+ T Cells donor2 CD8+ T Cells donor3 Multipotent Cord Blood Unrestricted Somatic Stem Cells donor2 Mesenchymal Stem Cells amniotic membrane donor2 Smooth Muscle Cells Umbilical Artery donor3 Smooth Muscle Cells Brachiocephalic donor3 Smooth Muscle Cells Umbilical Artery donor2 Smooth Muscle Cells Carotid donor3 Smooth Muscle Cells Subclavian Artery donor3 Smooth Muscle Cells Internal Thoracic Artery donor3 Smooth Muscle Cells Coronary Artery donor3 Smooth Muscle Cells Internal Thoracic Artery donor2 chorionic membrane cells donor2 CD4+CD25 CD45RA memory conventional T cells donor3 CD4+CD25 CD45RA+ naive conventional T cells expanded donor3 CD4+CD25 CD45RA+ naive conventional T cells expanded donor2 CD4+CD25 CD45RA memory conventional T cells expanded donor CD4+CD25 CD45RA+ naive conventional T cells expanded donor CD4+CD25+CD45RA memory regulatory T cells expanded donor3 CD4+CD25+CD45RA memory regulatory T cells expanded donor2 CD4+CD25+CD45RA memory regulatory T cells expanded donor CD4+CD25+CD45RA+ naive regulatory T cells expanded donor (a) (b) Brain Heart Blood Lung Large intestine medulla oblongata adult pool temporal lobe adult pool postcentral gyrus adult pool parietal lobe adult pool paracentral gyrus adult pool occipital pole adult pool nucleus accumbens adult pool insula adult pool frontal lobe adult pool olfactory region adult amygdala adult donor252 hippocampus adult donor252 parietal lobe adult donor252 occipital cortex adult donor252 medial temporal gyrus adult donor252 middle temporal gyrus donor252 pons adult pool brain adult pool colon adult pool small intestine adult pool placenta adult pool skeletal muscle adult pool heart adult pool spleen fetal pool skin fetal donor testis adult pool spleen adult pool lung adult pool adipose tissue adult pool dura mater adult donor thymus fetal pool thymus adult pool trachea adult pool esophagus adult pool liver adult pool corpus callosum adult pool optic nerve donor pancreas adult donor salivary gland adult pool heart fetal pool heart adult diseased donor heart adult diseased post infarction donor tonsil adult pool lung fetal donor Whole blood (ribopure) donor9325 donation2 Whole blood (ribopure) donor9325 donation Whole blood (ribopure) donor962 donation2 Whole blood (ribopure) donor962 donation Whole blood (ribopure) donor939 donation2 kidney fetal pool kidney adult pool thyroid adult pool retina adult pool uterus adult pool cervix adult pool prostate adult pool ovary adult pool bladder adult pool smooth muscle adult pool aorta adult pool blood adult pool diencephalon adult eye fetal donor spinal cord adult donor96 locus coeruleus adult donor96 spinal cord adult donor252 substantia nigra adult donor252 thalamus adult donor252 globus pallidus adult donor252 medulla oblongata adult donor252 locus coeruleus adult donor252 occipital cortex adult donor96 thalamus adult donor96 hippocampus adult donor96 parietal lobe adult donor96 medulla oblongata adult donor96 globus pallidus adult donor96 uterus fetal donor skeletal muscle fetal donor medial temporal gyrus adult donor96 penis adult stomach fetal donor thyroid fetal donor diaphragm fetal donor heart pulmonic valve adult caudate nucleus adult donor252 putamen adult donor96 occipital lobe adult donor throat fetal donor trachea fetal donor cerebral meninges adult parotid gland adult medial frontal gyrus adult donor96 brain adult donor submaxillary gland adult amygdala adult donor96 epididymis adult ductus deferens adult spinal cord fetal donor seminal vesicle adult cerebellum adult donor252 colon fetal donor cerebellum adult pool gall bladder adult colon adult donor cerebellum adult donor96 caudate nucleus adult donor96 pituitary gland adult donor96 small intestine fetal donor pituitary gland adult donor252 duodenum fetal donor tech rep duodenum fetal donor tech rep2 temporal lobe fetal donor tech rep2 occipital lobe fetal donor temporal lobe fetal donor tech rep parietal lobe fetal donor tongue fetal donor brain fetal pool pineal gland adult donor96 left ventricle adult donor pineal gland adult donor252 rectum fetal donor testis adult pool2 vein adult lung right lower lobe adult donor appendix adult throat adult tongue adult lymph node adult donor vagina adult Whole blood (ribopure) donor962 donation3 Whole blood (ribopure) donor939 donation3 liver fetal pool skeletal muscle soleus muscle donor left atrium adult donor umbilical cord fetal donor heart tricuspid valve adult heart mitral valve adult pons adult pool brain adult pool occipital pole adult pool temporal lobe adult pool postcentral gyrus adult pool nucleus accumbens adult pool parietal lobe adult pool paracentral gyrus adult pool occipital lobe adult donor amygdala adult donor252 hippocampus adult donor252 parietal lobe adult donor252 occipital cortex adult donor252 medial temporal gyrus adult donor252 middle temporal gyrus donor252 medial temporal gyrus adult donor96 insula adult pool frontal lobe adult pool caudate nucleus adult donor252 putamen adult donor96 diencephalon adult locus coeruleus adult donor96 medulla oblongata adult pool corpus callosum adult pool medial frontal gyrus adult donor96 brain adult donor thalamus adult donor252 substantia nigra adult donor252 globus pallidus adult donor252 medulla oblongata adult donor252 cerebellum adult donor252 amygdala adult donor96 locus coeruleus adult donor252 cerebellum adult pool occipital cortex adult donor96 cerebellum adult donor96 thalamus adult donor96 hippocampus adult donor96 caudate nucleus adult donor96 parietal lobe adult donor96 pituitary gland adult donor96 medulla oblongata adult donor96 pituitary gland adult donor252 globus pallidus adult donor96 appendix adult pineal gland adult donor96 pineal gland adult donor252 colon adult pool duodenum fetal donor tech rep2 colon fetal donor occipital lobe fetal donor temporal lobe fetal donor tech rep trachea fetal donor colon adult donor parietal lobe fetal donor brain fetal pool temporal lobe fetal donor tech rep2 lung adult pool trachea adult pool duodenum fetal donor tech rep rectum fetal donor Whole blood (ribopure) donor9325 donation2 lung fetal donor lung right lower lobe adult donor Whole blood (ribopure) donor9325 donation Whole blood (ribopure) donor962 donation2 Whole blood (ribopure) donor962 donation Whole blood (ribopure) donor939 donation2 Whole blood (ribopure) donor962 donation3 Whole blood (ribopure) donor939 donation3 blood adult pool heart fetal pool heart adult pool heart adult diseased donor left ventricle adult donor heart adult diseased post infarction donor heart mitral valve adult heart pulmonic valve adult left atrium adult donor heart tricuspid valve adult optic nerve donor (c) (d) Supplementary Figure 2: Clustering of samples using the inferred enhancer-target network of each sample as its signature for computing similarity values among samples. The clustering involves (a) All FANTOM5 cell samples; (b) FANTOM5 cell samples of the 9 largest facets; (c) All FANTOM5 tissue samples; and (d) FANTOM5 tissue samples of the 5 largest facets. Samples are colored based on the sample facets provided by FANTOM5. S27

29 NHLF Lung Fibroblast Primary Cells Blood T cell Brain Digestive Epithelial ESC ES derived HSC B cell H Derived Mesenchymal Stem Cells Cortex derived primary cultured neurospheres Ganglion Eminence derived primary cultured neurospheres Breast variant Human Mammary Epithelial Cells (vhmec) Foreskin Keratinocyte Primary Cells skin2 Foreskin Keratinocyte Primary Cells skin3 Foreskin Melanocyte Primary Cells skin Foreskin Melanocyte Primary Cells skin3 Brain Dorsolateral Prefrontal Cortex Brain Anterior Caudate Brain Angular Gyrus Brain Inferior Temporal Lobe Brain Substantia Nigra Brain Cingulate Gyrus Brain Hippocampus Middle Fetal Brain Male Fetal Brain Female Brain Germinal Matrix Liver Colonic Mucosa HepG2 Hepatocellular Carcinoma Cell Line Sigmoid Colon Esophagus Gastric Pancreas Small Intestine Rectal Mucosa Donor 29 Rectal Mucosa Donor 3 Stomach Mucosa Duodenum Mucosa Fetal Intestine Large Placenta Fetal Intestine Small H BMP4 Derived Trophoblast Cultured Cells Breast Myoepithelial Primary Cells Lung Placenta Amnion Fetal Adrenal Gland Bone Marrow Derived Cultured Mesenchymal Stem Cells A549 EtOH.2pct Lung Carcinoma Cell Line HeLa S3 Cervical Carcinoma Cell Line HMEC Mammary Epithelial Primary Cells NHEK Epidermal Keratinocyte Primary Cells K562 Leukemia Cells Spleen Fetal Stomach Fetal Muscle Trunk Fetal Muscle Leg GM2878 Lymphoblastoid Cells Primary hematopoietic stem cells HUVEC Umbilical Vein Endothelial Primary Cells Osteoblast Primary Cells Muscle Satellite Cultured Cells Primary hematopoietic stem cells short term culture Primary hematopoietic stem cells G CSF mobilized Female Primary neutrophils from peripheral blood Primary hematopoietic stem cells G CSF mobilized Male NHDF Ad Adult Dermal Fibroblast Primary Cells NH A Astrocytes Primary Cells Adipose Derived Mesenchymal Stem Cell Cultured Cells Monocytes CD4+ RO746 Primary Cells Primary monocytes from peripheral blood Primary mononuclear cells from peripheral blood (a) Mesenchymal Stem Cell Derived Chondrocyte Cultured Cells Fetal Thymus Thymus IMR9 fetal lung fibroblasts Cell Line HSMM cell derived Skeletal Muscle Myotubes Cells Dnd4 TCell Leukemia Cell Line HSMM Skeletal Muscle Myoblasts Cells Adipose Nuclei Primary B cells from cord blood Mesenchymal Stem Cell Derived Adipocyte Cultured Cells Primary T cells from cord blood Primary B cells from peripheral blood Foreskin Fibroblast Primary Cells skin2 Foreskin Fibroblast Primary Cells skin hesc Derived CD56+ Mesoderm Cultured Cells Fetal Kidney Ovary Aorta Primary T cells from peripheral blood hesc Derived CD56+ Ectoderm Cultured Cells H Derived Neuronal Progenitor Cultured Cells H9 Derived Neuron Cultured Cells H9 Derived Neuronal Progenitor Cultured Cells ES UCSF4 Cells H Cells ips 8 Cells HUES48 Cells HUES64 Cells ips 2b Cells ips 5b Cells HUES6 Cells ES I3 Cells H9 Cells ES WA7 Cells Primary T helper memory cells from peripheral blood 2 ips DF 6.9 Cells hesc Derived CD84+ Endoderm Cultured Cells H BMP4 Derived Mesendoderm Cultured Cells ips DF 9. Cells Primary T helper naive cells from peripheral blood Primary T helper cells from peripheral blood Primary T helper cells PMA I stimulated Primary T helper naive cells from peripheral blood Primary T helper memory cells from peripheral blood Primary T helper 7 cells PMA I stimulated Primary Natural Killer cells from peripheral blood Primary T regulatory cells from peripheral blood Primary T cells effector/memory enriched from peripheral blood Colon Smooth Muscle Duodenum Smooth Muscle Fetal Heart Psoas Muscle Pancreatic Islets Rectal Smooth Muscle Stomach Smooth Muscle Fetal Lung Right Ventricle Left Ventricle Right Atrium Skeletal Muscle Female Skeletal Muscle Male Primary T CD8+ naive cells from peripheral blood Primary T CD8+ memory cells from peripheral blood Primary T cells from peripheral blood Primary T helper memory cells from peripheral blood 2 Primary T helper 7 cells PMA I stimulated Primary T helper memory cells from peripheral blood Primary T helper naive cells from peripheral blood Primary T helper cells PMA I stimulated Primary T helper cells from peripheral blood Primary T helper naive cells from peripheral blood Primary T cells effector/memory enriched from peripheral blood Primary T regulatory cells from peripheral blood Primary Natural Killer cells from peripheral blood Primary T CD8+ memory cells from peripheral blood Primary B cells from peripheral blood Primary T cells from cord blood H Derived Mesenchymal Stem Cells H BMP4 Derived Trophoblast Cultured Cells Primary B cells from cord blood Breast Myoepithelial Primary Cells Primary neutrophils from peripheral blood Breast variant Human Mammary Epithelial Cells (vhmec) Primary monocytes from peripheral blood H9 Cells ES WA7 Cells hesc Derived CD84+ Endoderm Cultured Cells H BMP4 Derived Mesendoderm Cultured Cells hesc Derived CD56+ Mesoderm Cultured Cells Primary hematopoietic stem cells short term culture Primary hematopoietic stem cells Primary T CD8+ naive cells from peripheral blood Foreskin Melanocyte Primary Cells skin Brain Dorsolateral Prefrontal Cortex Foreskin Fibroblast Primary Cells skin Foreskin Fibroblast Primary Cells skin2 Primary mononuclear cells from peripheral blood Primary hematopoietic stem cells G CSF mobilized Male Primary hematopoietic stem cells G CSF mobilized Female Brain Anterior Caudate Gastric Brain Angular Gyrus Brain Inferior Temporal Lobe Brain Substantia Nigra Brain Cingulate Gyrus Brain Hippocampus Middle Brain Germinal Matrix Fetal Brain Male Esophagus Foreskin Keratinocyte Primary Cells skin2 Foreskin Keratinocyte Primary Cells skin3 Duodenum Mucosa HUES48 Cells H Cells HUES6 Cells ES I3 Cells HUES64 Cells Fetal Brain Female Fetal Intestine Large (b) Fetal Stomach Fetal Intestine Small Stomach Mucosa Rectal Mucosa Donor 3 Rectal Mucosa Donor 29 Small Intestine Sigmoid Colon Colonic Mucosa Foreskin Melanocyte Primary Cells skin3 hesc Derived CD56+ Ectoderm Cultured Cells H Derived Neuronal Progenitor Cultured Cells H9 Derived Neuron Cultured Cells H9 Derived Neuronal Progenitor Cultured Cells ES UCSF4 Cells Supplementary Figure 3: Clustering of samples using the inferred enhancer-target network of each sample as its signature for computing similarity values among samples. The clustering involves (a) All ENCODE+Roadmap samples; and (b) ENCODE+Roadmap samples of the 7 largest groups. Samples are colored based on the sample groups provided by Roadmap Epigenomics. S28

30 Supplementary Figure 4: A schematic diagram illustrating the workflow used for identifying potential HCC genes based on their differential expression and inverse differential methylation at their enhancers. S29

31 (a) (b) chr:9,282,6 PSRC exon enh chr:9,842,2 CpG site methylation MIHA + 5-aza-dC PLC5 + 5-aza-dC unchanged < 2% reduction > 2% reduction < 2% induction chr6:7,28,5 RBM24 exon enh chr6:7,32,3 Hep3B + 5-aza-dC PLC5 + 5-aza-dC chr5:,82,7 enh TERT exon chr5:,3, MIHA + 5-aza-dC BEL + 5-aza-dC Supplementary Figure 5: Effects of 5-aza-dC treatment. (a) A schematic diagram showing the changes of methylation levels at different gene promoter and enhancer regions after 5-aza-dC treatment in different liver cell lines. (b) The extent of gene reactivation upon demethylation treatment by 5-aza-dC was examined in 7 human liver cell lines. Data represent the mean (plus/minus standard deviation, SD) of at least three independent experiments, each performed in triplicate, and are presented relative to control. *:p<.5; **:p<.; ***:p<. S3

32 F-score F-score. Cutoff of JEME's predicted score (a). Cutoff of JEME's predicted score (b) Supplementary Figure 6: Training F-score of JEME predictions at different cutoff values of the predicted enhancer-tss interaction scores for (a) FANTOM5 and (b) ENCODE+Roadmap based on K562 ChIA- PET data. S3

33 CRISPR/Cas9 KO PSRC enh RBM24 enh TERT enh Ctrl KO Ctrl KO Ctrl KO 2, bp -, bp - 75 bp - 5 bp - 25 bp - bp - PCR from F to R Supplementary Figure 7: CRISPR/Cas9-mediated knockout (KO) of gene enhancers. A full-length gel photo corresponding to the cropped image in Figure 6d is shown. The molecular weight standard (DL2, Takara) is shown on the left. The band of < bp under RBM24 enh - KO is a non-specific PCR product. S32

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Expression deviation of the genes mapped to gene-wise recurrent mutations in the TCGA breast cancer cohort (top) and the TCGA lung cancer cohort (bottom). For each gene (each pair

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

MIR retrotransposon sequences provide insulators to the human genome

MIR retrotransposon sequences provide insulators to the human genome Supplementary Information: MIR retrotransposon sequences provide insulators to the human genome Jianrong Wang, Cristina Vicente-García, Davide Seruggia, Eduardo Moltó, Ana Fernandez- Miñán, Ana Neto, Elbert

More information

Nature Structural & Molecular Biology: doi: /nsmb.2419

Nature Structural & Molecular Biology: doi: /nsmb.2419 Supplementary Figure 1 Mapped sequence reads and nucleosome occupancies. (a) Distribution of sequencing reads on the mouse reference genome for chromosome 14 as an example. The number of reads in a 1 Mb

More information

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region methylation in relation to PSS and fetal coupling. A, PSS values for participants whose placentas showed low,

More information

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. Supplementary Figure Legends Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. A. Screenshot of the UCSC genome browser from normalized RNAPII and RNA-seq ChIP-seq data

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles ChromHMM Tutorial Jason Ernst Assistant Professor University of California, Los Angeles Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap

More information

Chromatin marks identify critical cell-types for fine-mapping complex trait variants

Chromatin marks identify critical cell-types for fine-mapping complex trait variants Chromatin marks identify critical cell-types for fine-mapping complex trait variants Gosia Trynka 1-4 *, Cynthia Sandor 1-4 *, Buhm Han 1-4, Han Xu 5, Barbara E Stranger 1,4#, X Shirley Liu 5, and Soumya

More information

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory Computational aspects of ChIP-seq John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory ChIP-seq Using highthroughput sequencing to investigate DNA

More information

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types Epigenome Informatics Workshop Bioinformatics Research Laboratory 1 Introduction Active or inactive states of transcription factor

More information

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Ondrej Podlaha 1, Subhajyoti De 2,3,4, Mithat Gonen 5, Franziska Michor 1 * 1 Department of Biostatistics

More information

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Kaifu Chen 1,2,3,4,5,10, Zhong Chen 6,10, Dayong Wu 6, Lili Zhang 7, Xueqiu Lin 1,2,8,

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features Tyler Derr @ Yue Lab tsd5037@psu.edu Background Hi-C is a chromosome conformation capture (3C)

More information

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological

More information

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the

More information

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis Shisong Ma 1,2*, Michael Snyder 3, and Savithramma P Dinesh-Kumar 2* 1 School of Life Sciences, University

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

The Epigenome Tools 2: ChIP-Seq and Data Analysis

The Epigenome Tools 2: ChIP-Seq and Data Analysis The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline Epigenome: basics review ChIP-seq overview

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality. Supplementary Figure 1 Assessment of sample purity and quality. (a) Hematoxylin and eosin staining of formaldehyde-fixed, paraffin-embedded sections from a human testis biopsy collected concurrently with

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Pan-cancer analysis of global and local DNA methylation variation a) Variations in global DNA methylation are shown as measured by averaging the genome-wide

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

ChIP-seq data analysis

ChIP-seq data analysis ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017 Contents Background ChIP-seq protocol ChIP-seq data analysis Transcriptional regulation Transcriptional

More information

Yingying Wei George Wu Hongkai Ji

Yingying Wei George Wu Hongkai Ji Stat Biosci (2013) 5:156 178 DOI 10.1007/s12561-012-9066-5 Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis,

More information

Supplemental Figures. Figure S1: 2-component Gaussian mixture model of Bourdeau et al. s fold-change distribution

Supplemental Figures. Figure S1: 2-component Gaussian mixture model of Bourdeau et al. s fold-change distribution Supplemental Figures Figure S1: 2-component Gaussian mixture model of Bourdeau et al. s fold-change distribution 1 All CTCF Unreponsive RefSeq Frequency 0 2000 4000 6000 0 500 1000 1500 2000 Block Size

More information

EPIGENOMICS PROFILING SERVICES

EPIGENOMICS PROFILING SERVICES EPIGENOMICS PROFILING SERVICES Chromatin analysis DNA methylation analysis RNA-seq analysis Diagenode helps you uncover the mysteries of epigenetics PAGE 3 Integrative epigenomics analysis DNA methylation

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Peak-calling for ChIP-seq and ATAC-seq

Peak-calling for ChIP-seq and ATAC-seq Peak-calling for ChIP-seq and ATAC-seq Shamith Samarajiwa CRUK Autumn School in Bioinformatics 2017 University of Cambridge Overview Peak-calling: identify enriched (signal) regions in ChIP-seq or ATAC-seq

More information

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Supplemental Data Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Jianfu Jiang 1, Xinna Liu 1, Guotian Liu, Chonghuih Liu*, Shaohuah Li*, and Lijun

More information

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni. Supplementary Figure 1 Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Expression of Mll4 floxed alleles (16-19) in naive CD4 + T cells isolated from lymph nodes and

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock

More information

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD, Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD, Crawford, GE, Ren, B (2007) Distinct and predictive chromatin

More information

Gene-microRNA network module analysis for ovarian cancer

Gene-microRNA network module analysis for ovarian cancer Gene-microRNA network module analysis for ovarian cancer Shuqin Zhang School of Mathematical Sciences Fudan University Oct. 4, 2016 Outline Introduction Materials and Methods Results Conclusions Introduction

More information

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012 Elucidating Transcriptional Regulation at Multiple Scales Using High-Throughput Sequencing, Data Integration, and Computational Methods Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder

More information

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis Tieliu Shi tlshi@bio.ecnu.edu.cn The Center for bioinformatics

More information

Unsupervised Identification of Isotope-Labeled Peptides

Unsupervised Identification of Isotope-Labeled Peptides Unsupervised Identification of Isotope-Labeled Peptides Joshua E Goldford 13 and Igor GL Libourel 124 1 Biotechnology institute, University of Minnesota, Saint Paul, MN 55108 2 Department of Plant Biology,

More information

Modeling Sentiment with Ridge Regression

Modeling Sentiment with Ridge Regression Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

Supporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction

Supporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction Supporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction Amir Barati Farimani, Mohammad Heiranian, Narayana R. Aluru 1 Department

More information

CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells

CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells SUPPLEMENTARY INFORMATION CTCF-Mediated Functional Chromatin Interactome in Pluripotent Cells Lusy Handoko 1,*, Han Xu 1,*, Guoliang Li 1,*, Chew Yee Ngan 1, Elaine Chew 1, Marie Schnapp 1, Charlie Wah

More information

Transcript-indexed ATAC-seq for immune profiling

Transcript-indexed ATAC-seq for immune profiling Transcript-indexed ATAC-seq for immune profiling Technical Journal Club 22 nd of May 2018 Christina Müller Nature Methods, Vol.10 No.12, 2013 Nature Biotechnology, Vol.32 No.7, 2014 Nature Medicine, Vol.24,

More information

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand ChIP-seq analysis J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand Tuesday : quick introduction to ChIP-seq and peak-calling (Presentation + Practical session)

More information

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data Bioinformatics methods, models and applications to disease Alex Essebier ChIP-seq experiment To

More information

SUPPLEMENTARY FIGURES: Supplementary Figure 1

SUPPLEMENTARY FIGURES: Supplementary Figure 1 SUPPLEMENTARY FIGURES: Supplementary Figure 1 Supplementary Figure 1. Glioblastoma 5hmC quantified by paired BS and oxbs treated DNA hybridized to Infinium DNA methylation arrays. Workflow depicts analytic

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression. Supplementary Figure 1 Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression. a, Design for lentiviral combinatorial mirna expression and sensor constructs.

More information

SUPPLEMENTAL INFORMATION

SUPPLEMENTAL INFORMATION SUPPLEMENTAL INFORMATION GO term analysis of differentially methylated SUMIs. GO term analysis of the 458 SUMIs with the largest differential methylation between human and chimp shows that they are more

More information

SUPPLEMENTARY APPENDIX

SUPPLEMENTARY APPENDIX SUPPLEMENTARY APPENDIX 1) Supplemental Figure 1. Histopathologic Characteristics of the Tumors in the Discovery Cohort 2) Supplemental Figure 2. Incorporation of Normal Epidermal Melanocytic Signature

More information

CSE 255 Assignment 9

CSE 255 Assignment 9 CSE 255 Assignment 9 Alexander Asplund, William Fedus September 25, 2015 1 Introduction In this paper we train a logistic regression function for two forms of link prediction among a set of 244 suspected

More information

Exercises: Differential Methylation

Exercises: Differential Methylation Exercises: Differential Methylation Version 2018-04 Exercises: Differential Methylation 2 Licence This manual is 2014-18, Simon Andrews. This manual is distributed under the creative commons Attribution-Non-Commercial-Share

More information

Sudin Bhattacharya Institute for Integrative Toxicology

Sudin Bhattacharya Institute for Integrative Toxicology Beyond the AHRE: the Role of Epigenomics in Gene Regulation by the AHR (or, Varied Applications of Computational Modeling in Toxicology and Ingredient Safety) Sudin Bhattacharya Institute for Integrative

More information

Predicting clinical outcomes in neuroblastoma with genomic data integration

Predicting clinical outcomes in neuroblastoma with genomic data integration Predicting clinical outcomes in neuroblastoma with genomic data integration Ilyes Baali, 1 Alp Emre Acar 1, Tunde Aderinwale 2, Saber HafezQorani 3, Hilal Kazan 4 1 Department of Electric-Electronics Engineering,

More information

A Quick-Start Guide for rseqdiff

A Quick-Start Guide for rseqdiff A Quick-Start Guide for rseqdiff Yang Shi (email: shyboy@umich.edu) and Hui Jiang (email: jianghui@umich.edu) 09/05/2013 Introduction rseqdiff is an R package that can detect differential gene and isoform

More information

Supplementary Figure 1

Supplementary Figure 1 Supplementary Figure 1 Supplementary Fig. 1: Quality assessment of formalin-fixed paraffin-embedded (FFPE)-derived DNA and nuclei. (a) Multiplex PCR analysis of unrepaired and repaired bulk FFPE gdna from

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Colon cancer subtypes from gene expression data

Colon cancer subtypes from gene expression data Colon cancer subtypes from gene expression data Nathan Cunningham Giuseppe Di Benedetto Sherman Ip Leon Law Module 6: Applied Statistics 26th February 2016 Aim Replicate findings of Felipe De Sousa et

More information

EXPression ANalyzer and DisplayER

EXPression ANalyzer and DisplayER EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh

More information

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Patrick J. Heagerty Department of Biostatistics University of Washington 174 Biomarkers Session Outline

More information

Supplementary. properties of. network types. randomly sampled. subsets (75%

Supplementary. properties of. network types. randomly sampled. subsets (75% Supplementary Information Gene co-expression network analysis reveals common system-level prognostic genes across cancer types properties of Supplementary Figure 1 The robustness and overlap of prognostic

More information

High Throughput Sequence (HTS) data analysis. Lei Zhou

High Throughput Sequence (HTS) data analysis. Lei Zhou High Throughput Sequence (HTS) data analysis Lei Zhou (leizhou@ufl.edu) High Throughput Sequence (HTS) data analysis 1. Representation of HTS data. 2. Visualization of HTS data. 3. Discovering genomic

More information

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. Supplementary Figure 1 Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. (a,b) Values of coefficients associated with genomic features, separately

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and

More information

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

DNA Sequence Bioinformatics Analysis with the Galaxy Platform DNA Sequence Bioinformatics Analysis with the Galaxy Platform University of São Paulo, Brazil 28 July - 1 August 2014 Dave Clements Johns Hopkins University Robson Francisco de Souza University of São

More information

Inference of Isoforms from Short Sequence Reads

Inference of Isoforms from Short Sequence Reads Inference of Isoforms from Short Sequence Reads Tao Jiang Department of Computer Science and Engineering University of California, Riverside Tsinghua University Joint work with Jianxing Feng and Wei Li

More information

Cancer Informatics Lecture

Cancer Informatics Lecture Cancer Informatics Lecture Mayo-UIUC Computational Genomics Course June 22, 2018 Krishna Rani Kalari Ph.D. Associate Professor 2017 MFMER 3702274-1 Outline The Cancer Genome Atlas (TCGA) Genomic Data Commons

More information

Supplementary Materials

Supplementary Materials Supplementary Materials July 2, 2015 1 EEG-measures of consciousness Table 1 makes explicit the abbreviations of the EEG-measures. Their computation closely follows Sitt et al. (2014) (supplement). PE

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

cis-regulatory enrichment analysis in human, mouse and fly

cis-regulatory enrichment analysis in human, mouse and fly cis-regulatory enrichment analysis in human, mouse and fly Zeynep Kalender Atak, PhD Laboratory of Computational Biology VIB-KU Leuven Center for Brain & Disease Research Laboratory of Computational Biology

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Assignment 5: Integrative epigenomics analysis

Assignment 5: Integrative epigenomics analysis Assignment 5: Integrative epigenomics analysis Due date: Friday, 2/24 10am. Note: no late assignments will be accepted. Introduction CpG islands (CGIs) are important regulatory regions in the genome. What

More information

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome

The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome The Insulator Binding Protein CTCF Positions 20 Nucleosomes around Its Binding Sites across the Human Genome Yutao Fu 1, Manisha Sinha 2,3, Craig L. Peterson 3, Zhiping Weng 1,4,5 * 1 Bioinformatics Program,

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

Systematic exploration of autonomous modules in noisy microrna-target networks for testing the generality of the cerna hypothesis

Systematic exploration of autonomous modules in noisy microrna-target networks for testing the generality of the cerna hypothesis RESEARCH Systematic exploration of autonomous modules in noisy microrna-target networks for testing the generality of the cerna hypothesis Danny Kit-Sang Yip 1, Iris K. Pang 2 and Kevin Y. Yip 1,3,4* *

More information

Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition

Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition Donald J. Patterson, Ken Yasuhara, Walter L. Ruzzo January 3-7, 2002 Pacific Symposium on Biocomputing University of Washington Computational

More information

GeneOverlap: An R package to test and visualize

GeneOverlap: An R package to test and visualize GeneOverlap: An R package to test and visualize gene overlaps Li Shen Contact: li.shen@mssm.edu or shenli.sam@gmail.com Icahn School of Medicine at Mount Sinai New York, New York http://shenlab-sinai.github.io/shenlab-sinai/

More information

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Tong WW, McComb ME, Perlman DH, Huang H, O Connor PB, Costello

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:.38/nature8975 SUPPLEMENTAL TEXT Unique association of HOTAIR with patient outcome To determine whether the expression of other HOX lincrnas in addition to HOTAIR can predict patient outcome, we measured

More information

Protein Structure & Function. University, Indianapolis, USA 3 Department of Molecular Medicine, University of South Florida, Tampa, USA

Protein Structure & Function. University, Indianapolis, USA 3 Department of Molecular Medicine, University of South Florida, Tampa, USA Protein Structure & Function Supplement for article entitled MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in

More information

User Guide. Association analysis. Input

User Guide. Association analysis. Input User Guide TFEA.ChIP is a tool to estimate transcription factor enrichment in a set of differentially expressed genes using data from ChIP-Seq experiments performed in different tissues and conditions.

More information

A Statistical Framework for Classification of Tumor Type from microrna Data

A Statistical Framework for Classification of Tumor Type from microrna Data DEGREE PROJECT IN MATHEMATICS, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2016 A Statistical Framework for Classification of Tumor Type from microrna Data JOSEFINE RÖHSS KTH ROYAL INSTITUTE OF TECHNOLOGY

More information

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend Table of content -Supplementary methods -Figure S1 -Figure S2 -Figure S3 -Table legend Supplementary methods Yeast two-hybrid bait basal transactivation test Because bait constructs sometimes self-transactivate

More information

MODULE 3: TRANSCRIPTION PART II

MODULE 3: TRANSCRIPTION PART II MODULE 3: TRANSCRIPTION PART II Lesson Plan: Title S. CATHERINE SILVER KEY, CHIYEDZA SMALL Transcription Part II: What happens to the initial (premrna) transcript made by RNA pol II? Objectives Explain

More information

Introduction to Discrimination in Microarray Data Analysis

Introduction to Discrimination in Microarray Data Analysis Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t

More information

Below, we included the point-to-point response to the comments of both reviewers.

Below, we included the point-to-point response to the comments of both reviewers. To the Editor and Reviewers: We would like to thank the editor and reviewers for careful reading, and constructive suggestions for our manuscript. According to comments from both reviewers, we have comprehensively

More information

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017 Epigenetics Jenny van Dongen Vrije Universiteit (VU) Amsterdam j.van.dongen@vu.nl Boulder, Friday march 10, 2017 Epigenetics Epigenetics= The study of molecular mechanisms that influence the activity of

More information

Review: Genome assembly Reads

Review: Genome assembly Reads Assembly validation Review: Genome assembly Reads Contigs Scaffolds Chromosome Review: Mate pair data Overlap-Layout-Consensus AMOS project: A Modular Open Source assembler Importing data to an AMOS bank

More information

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, Supplementary Information Supplementary Figures Supplementary Figure 1. a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, gene ID and specifities are provided. Those highlighted

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by

More information

Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE

Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE Peng Qiu1,4, Erin F. Simonds2, Sean C. Bendall2, Kenneth D. Gibbs Jr.2, Robert V. Bruggner2, Michael

More information

Expanded View Figures

Expanded View Figures Solip Park & Ben Lehner Epistasis is cancer type specific Molecular Systems Biology Expanded View Figures A B G C D E F H Figure EV1. Epistatic interactions detected in a pan-cancer analysis and saturation

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course Introduction Biostatistics and Bioinformatics Summer 2017 From Raw Unaligned Reads To Aligned Reads To Counts Differential Expression Differential Expression 3 2 1 0 1

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Frequency of alternative-cassette-exon engagement with the ribosome is consistent across data from multiple human cell types and from mouse stem cells. Box plots showing AS frequency

More information

Supplementary Figure 1 IL-27 IL

Supplementary Figure 1 IL-27 IL Tim-3 Supplementary Figure 1 Tc0 49.5 0.6 Tc1 63.5 0.84 Un 49.8 0.16 35.5 0.16 10 4 61.2 5.53 10 3 64.5 5.66 10 2 10 1 10 0 31 2.22 10 0 10 1 10 2 10 3 10 4 IL-10 28.2 1.69 IL-27 Supplementary Figure 1.

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Effect of HSP90 inhibition on expression of endogenous retroviruses. (a) Inducible shrna-mediated Hsp90 silencing in mouse ESCs. Immunoblots of total cell extract expressing the

More information

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information