Nature Genetics: doi: /ng Supplementary Figure 1. Sample selection procedure and TL ratio across cancer.

Size: px
Start display at page:

Download "Nature Genetics: doi: /ng Supplementary Figure 1. Sample selection procedure and TL ratio across cancer."

Transcription

1 Supplementary Figure 1 Sample selection procedure and TL ratio across cancer. (a) Flowchart of sample selection. After exclusion of unsuitable samples, 18,430 samples remained. Tumor and normal samples were subsequently paired, and extra pairs per patient were dropped for a paired set consisting of n = 8,953 pairs. Further sample selection based on available data is presented in Figure 1a. The sample selection, tumor/normal pairing procedure and replicate analysis are explained in-depth in the supplementary methods. (b) Boxplot of log T/N TL ratio log(tumor TL/Control TL) across cancer types for all n = 8,953 samples in the paired set. Boxes colored in blue indicate that the median TL ratio for that cancer type is greater than 1 (log ratio greater than 0), and thus more than 50% of samples show TL elongation. Numbers and percentages at the top and bottom whiskers represent cancer cases with TL longer and shorter than matched normal, respectively.

2 Supplementary Figure 2 Benchmark of TL and TL ratio across centers and sequencing methods. (a) Boxplots showing log(tl) across centers and sequencing methods. Each point indicates the median for a single cancer type. BI, Broad Institute sequencing center; BCM, Baylor College of Medicine sequencing center; WUGSC, Washington University sequencing center; HMS-RK, low pass sequencing. (b) Boxplots showing the log(tumor TL/Normal TL) ratio across centers and sequencing methods. Each point indicates the median for a single cancer type. (c) Comparison of n = 3,883 log(tl) replicates. Sequencing center was kept constant for comparison across sequencing method and was variable for comparison within sequencing method. Replicate pairs that did not follow these criteria were dropped. (c) Boxplots showing the log(tumor TL/Normal TL) ratio across centers and sequencing methods. Each point indicates the median for a single cancer type. (d) Comparison of n = 1,861 T/N TL ratio replicates. Sequencing center was kept constant for comparison across sequencing method and was variable for comparison within sequencing method. Replicate pairs that did not follow these criteria were dropped.

3 Supplementary Figure 3 Telomere length in matching normal tissue. (a) Linear mixed model mean TL estimates by for each tissue type. Error bars indicate 95% confidence interval. Estimates were adjusted for age, gender and sequencing center. (b) Pairwise comparison of normal TL between tissue types. Tukey-Kramer adjustment was used for all pairwise comparisons. (c) Scatter plots for age and normal TL across each normal tissue type (n > 10).

4 Supplementary Figure 4 TERT alterations (promoter mutations, amplifications and structural variants) across cancer types. (a) Barplot of TERTp mutations by tumor type in samples from the extended set with TERTp mutation status known (n = 1,581). (b) Barplot of TERT amplifications by tumor type in all samples from the extended set (n = 6,835). (c) Barplot of TERT and TERTp structural variants by tumor type in samples from the extended set with structural variant calls (n = 792). (d) Boxplot of H3K27ac, H3K27me3 and H3K27me1 levels from the Roadmap Epigenomics Consortium at the locations of TERTp structural variant proximal and distal breakpoints (n = 17). Each plot represents a comparison of the distal and proximal breakpoint in a single sample.

5 Supplementary Figure 5 Further characterization of TERT alterations in cancer. (a) Circos plot of TERT fusion partners. Only segments of each chromosome are shown. Segment coordinates in kb are indicated in purple. The 5 fusion partner is shown in blue and the 3 fusion partner is shown in orange. (b) Pairwise comparison of TERT promoter beta value (cg ) in tumor and normal. P-values were calculated using a two sided Mann Whitney U test. (c) Boxplot of TL length ratio in groups of TERT alterations. TL length ratio in TERT altered groups was compared to the TERT wt group using a twosided t-test. TERTp meth, promoter methylation; TERTp mut, promoter hotspot mutation; TERT amp, copy number amplification; TERTp sv, promoter structural variation; TERT sv, gene body structural variation; TERT wt, cases without detectable evidences in all aforementioned groups. (d) Boxplot of TERC expression in groups of TERC alterations. TERC expression in TERC altered groups was compared to the TERC wt group using a two-sided t-test. ***P < ; **P < 0.001; *P < 0.05; N.S. not significant.

6 Supplementary Figure 6 Abundance of the minus beta splice variant. (a) Example of TERT isoform abundance. The figure shows three samples with abundant full-length transcripts (in red), three samples with mixed full length and minus-beta transcripts (orange) and three samples with predominantly minus beta transcripts (green). Fulllength and minus beta exon models are shown for reference. The MISO ψ (psi, percent spliced in) is shown for each sample and indicates the percentage of full-length transcripts relative to minus beta transcripts. (b) Histogram of the percentage of full-length transcripts relative to minus beta transcripts in n = 1,201 samples. Samples with less than 25% full length transcripts (more than 75% minus beta transcripts) are shown in green, samples with between 25% and 75% full-length transcripts in orange, and samples with more than 75% full-length transcripts in red. There is a significant enrichment of full-length transcripts relative to minus beta transcripts (one-sample t-test P < ; Mu = 50%).

7 Supplementary Figure 7 Inferring telomerase activity using a gene expression signature. (a) The predicted telomerase activity is correlated with experimentally determined telomerase enzymatic activity in 11 urothelial cancer cell lines with a borderline significance (P = 0.07, Spearman correlation). (b) Telomerase signature score in tumor and normal samples in TCGA cohorts. Across all cancer types except KICH and THCA, tumor scores are significantly higher than that of normal samples (P < 0.001, t-test). (c) Distribution of telomerase activity score across TERT alteration categories. All TERT aberrant groups are significantly higher than the TERT wt group (P < 0.01). (d) Telomerase activity score in 31 cancer types. x-axis represents mean TERT expression measured by TPM. y-axis represents median telomerase score. The size of each dot is proportional to the percentage of TERT expressing samples in the corresponding cancer type. x-axis and y-axis are in log 2 scale for better visualization.

8 Supplementary Figure 8 TL genomic associations, ATRX and TERRA. (a) Scatterplot showing gene to TL ratio associations using the extended set (n = 6,835). Results are grouped by disease. P-values were calculated using a two-sided t-test and adjusted for multiple testing using FDR. Up to five negatively (left) or positively (right) associated genes with an FDR < 0.25 are listed in each plot. (b) Histogram of DNA breakpoints in ATRX in the core set (n = 473 samples). Bars are colored according to breakpoint detection method. (c) Circos plot of ATRX fusion partners. Only segments of each chromosome are shown. Segment coordinates in kb are indicated. The 5 fusion partner is shown in blue and the 3 fusion partner is shown in orange.

9 Supplementary Figure 9 Classification of tumors on the basis of TERT expression and ATRX or DAXX genomic alterations. (a) Comparison of ATRX variant classification between samples classified as TERT expr and samples classified as ATRX/DAXX alt. Silent mutations, DAXX alterations, deletions and structural variants were omitted. Truncating variants consist of frameshift, nonsense and splice site variants. Non-truncating mutations consist of missense mutations and in-frame indels. P-value was calculated using a Fisher s exact test. (b) Distribution of telomerase signature score across TERT/ATRX/DAXX groups in the core set. A small group, TERT alt -TERT expr -ATRX/DAXX alt, has only 2 cases thus were excluded from the comparison. All P values are derived from comparisons with the ATRX/DAXX alt group (blue). Red are TERT expressing groups, and purple is the double wild-type group. The group in black is TERT alt but without expression. (c) Number of copy number segments by TERT expr -ATRX/DAXX alt group. Number of segments between groups was compared using two-sided t-tests. (d) Mutational burden by TERT expr -ATRX/DAXX alt groups. Mutational burden between groups was compared using two-sided t-tests. (e) Survival differences between TERT expr -ATRX/DAXX alt groups by cancer type. Number of patients included, number of events (deaths) and univariable log-rank P-values are indicated in the bottom left corner for each tumor type. Hazard ratios and 95% confidence intervals comparing survival in the double wild-type group (green) and ATRX/DAXX alt group (blue) relative to the TERT expr group are shown in the top right corner of each tumor type. Groups with less than six samples were omitted. ***P < ; **P < 0.001; *P < 0.05; N.S. not significant.

10 Data Generation Raw sequencing data collection Pre-aligned whole-genome (WGS), low-pass whole-genome (LPS) and exome (WXS) BAM files (n=24,049 BAM files; Table S1B; ~720 TB) were downloaded from CGHub for analysis. BAM files were processed through a pipeline for telomere length calling. WGS and LPS BAM files were additionally processed for structural variant calling and TERT promoter pileup. Mutation calling was not performed and mutation calls based on WXS were downloaded and compiled as described elsewhere in this document. Pre-aligned RNA sequencing BAM files (n=12,160 BAM files; ~83 TB) were downloaded from CGHub for analysis. BAM files were processed through a pipeline for TERRA expression estimation. Unaligned RNA sequencing BAM files were downloaded from CGHub and processed through a pipeline of RNA-seq Data Analysis (PRADA) as previously described 1,2. Processed data was used for fusion transcript detection and TERT isoform detection. Clinical data collection Clinical data (~237 MB) was downloaded from Firehose (stdata 2016_01_28) using the firehose_get tool. Clinical data elements comprise histology, grade, gender, age at diagnosis, race/ethnicity, smoking history, vital status and overall survival. Overall survival was defined as the time from surgical diagnosis until death. Cases that were still alive at the time of this study have overall survival time censored at the time of last follow-up. Mutation data collection Mutation annotation files (MAF) for the different cancer types were obtained from the DCC, the publication supplements or the active working group (~ 3 GB). Because MAF files were obtained from different sources and initial MAF files were generated by different sequencing centers, the files had slightly different structures. Column names were therefore renamed to reflect similar data in each file, and the resulting files were combined into a single pan-cancer MAF file. No attempts were made to liftover older hg18 calls to hg19 or to re-annotate the MAF files using uniform gene annotations. Replicate samples and samples based on whole genome amplified DNA were excluded. Intronic variants and blacklisted variants were subsequently dropped. Variants with a variant allele fraction < 0.1 were dropped when this information was available. Samples were annotated as being hypermutated if the mutation frequency for that sample exceeded the median mutation frequency for the respective tumor type, plus three times the median absolute deviation. For each gene we calculated the mutation frequency, the proportion of hotspot and the proportion of truncating mutations. To avoid incorrect frequencies of hotspot mutations, only hg19 aligned mutation calls were included in this analysis. Genes with greater than 20% hotspot mutations were annotated as putative oncogene, whereas genes with greater than 20% truncating mutations were annotated as putative tumor suppressor gene 3. mrna expression data collection 1

11 Illumina HiSeq UNC aligned (RNAseq V2) gene-level, isoform-level and exon-level expression data (~85 GB) was downloaded from Broad Firehose (stdata 2016_01_28) using the firehose get_tool. Aliquots from all tumor types were combined, matched against TCGA annotation and aliquots with redactions were removed. A gene expression matrix was created for TPM data and raw RSEM read counts. TPM data was mediancentered by cancer type to alleviate tissue-specific expression effects. Prior to median-centering, genes with variability below 9 or with a TPM-value of 2 or less in 95% of samples or more were filtered. DNA methylation data collection Illumina Infinium Human Methylation 450k DNA methylation data (~177 GB) was downloaded from Broad Firehose (stdata 2016_01_28) using the firehose get tool. Aliquots from all tumor types were combined, matched against TCGA annotation and aliquots with relevant redactions were removed. Probes with any NA values were removed. DNA copy number data collection SNP6 segmentation files (~860 MB) were downloaded from Broad Firehose (stdata 2016_01_28) using the firehose_get tool. Aliquots from all tumor types were merged, matched against TCGA annotation, and aliquots with relevant redactions were removed. The tool GISTIC 2.0 was then used to identify significantly reoccurring focal and broad copy number changes 4 on the combined segmentation file. Events with a Q-value < 0.10 were considered significant. For each statistically significant peak, GISTIC 2.0 indicates a narrow focal peak and a wider surrounding peak. Breakpoints in ATRX were manually curated to evaluate suspected structural variants. The ATRX region of interest (ROI) was defined as the ATRX gene body according to UCSC (chrx: ). ATRX was deemed to be broken when there were one or more copy number segment(s) partially overlapping the ATRX ROI or copy number segment(s) within the ATRX. When there were two or more copy number segments, we required a difference in segment mean of at least 0.3, whereas if there was only one segment involved, we required minimum absolute amplitude of at least 0.3. Sample Selection We included all purified DNA hg19-aligned exome, whole genome and low-pass sequencing samples available on CGHub 5 from the following 32 TCGA cohorts: ACC, BLCA, BRCA, CESC, CHOL, COAD, DLBC, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LAML, LGG, LIHC, LUAD, LUSC, OV, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, TGCT, THCA, THYM, UCEC, UCS and UVM. An exception was made for LAML, which did not have purified DNA exome sequencing samples available, and amplified DNA samples were included instead. A full list of all N=24,049 samples can be found in Supplementary Table 1. A flowchart of the sample selection procedure can be found in Supplementary Figure 1a. 2

12 All N=24,049 downloaded samples were then annotated based on TCGA sample annotation downloaded on May 12, 2016 from the TCGA portal ( Samples that were annotated as having a bad DNA quality or failing a QC step were excluded. We also dropped samples with less than 20 million reads and samples where we failed to compute a telomere length. We next removed technical replicates (N=4,973), although we kept all replicates for replicate analysis described elsewhere in this document. We then dropped samples from patients who were later marked as having received systematic treatment prior to sample collection. Similarly, we dropped samples from patients who after revision of the pathological diagnosis, no longer met inclusion criteria for the intended study. In order to keep only one tumor sample per patient, metastatic and recurrent samples were dropped. An exception is melanoma, where the only sample was often a metastatic tumor. In this case the metastatic tumor was kept. Rarely patients had additional primary tumor samples, buccal cell or bone marrow normal controls. Because these samples were very rare, they were dropped from our analysis. Some patients had a tumor-normal pair profiled on one platform/center, but an unpaired sample of a different type profiled on another platform. In order to limit each patient to a single center and platform, the unpaired partner in these instances was also dropped (N=114). After exclusion, an unpaired set of N=18,430 remained. In order to match tumor and normal samples from the same patient together we first removed all samples without a possible paired partner (n=222). Only pairs where both tumor and normal were profiled by the same sequencing center and method were considered. Next, all possible tumor-normal pairs were constructed (N=9,275), and these consisted of solid tumor vs. blood normal, solid tumor vs. solid normal, metastatic tumor vs. blood normal, metastatic tumor vs. solid normal and blood tumor vs. solid control. Several patients had more than one tumor-normal pairs and we therefore dropped N=322 second pairs, preferring blood normal over solid normal when both were available. The final paired set consisted of N=8,953 tumor/normal pairs. We then took the set of 473 T/N pairs with available whole genome sequencing-based TL, RNA sequencing (gene expression, gene fusion and TERRA expression), DNA methylation profiles, DNA copy number profiling, exome sequencing-based mutation calls, whole-genome or targeted sequencing based TERT promoter mutation calls and whole genome sequencing based structural variant calls and labeled this as the core cohort. To maximize our power, we constructed an extended dataset (n=6,835) including all core set samples but additionally containing cases with overlapping low-pass or exome sequencingbased TL, exome sequencing-based mutation calls, DNA copy number profiles and RNA sequencing (gene expression, gene fusion and TERRA expression). Each sample in the extended set was annotated based on TERT expression and somatic alterations in DAXX and/or ATRX. Samples were classified as TERT expressed when we detected 2 or more raw aligned RSEM quantified reads. The remaining samples were classified as double wild-type or ATRX/DAXX altered if we detected ATRX or DAXX mutations, deletions or structural variants. 3

13 In 3% of tumors we found co-occurring ATRX/DAXX alterations and TERT expression (n=210/6,835, 3%). However, these co-occurring ATRX mutations tended to be non-truncating events (missensense, inframe indel), whereas non co-occurring ATRX mutations were more often loss-of-function truncating mutations (eg. frame shift, nonsense; Fisher s Exact OR 14.15, 95% CI , P < ). Moreover, co-occurring ATRX mutations demonstrated a lower variant allele fraction (0.29±0.15) compared to non co-occurring ATRX mutations (0.36±0.12, two-sided t-test P<0.0001). Lastly, these samples with cooccurring ATRX/DAXX mutations and TERT expression were more often found to show a hypermutator phenotype (Fisher s Exact OR 0.12, 95% CI , P<0.0001). Telomere length technical replicate analysis During sample selection we identified 4,973 technical replicates that were subsequently dropped. Because each replicate was linked to at least one other aliquot for the same sample that was not dropped, these were pulled back in, resulting in 9,058 samples. Most samples had one replicate (N=3,373 samples), some had two (N=592) all the way up to eight replicates in two samples. We then constructed all the possible unique combination between replicates from a single sample, for example a sample with only one replicate could form only one possible combination (A versus B), whereas a sample with two replicates could form two possible combinations (A vs. B, A vs. C and B vs. C), also all the way up to 36 different combinations for the two samples with eight replicates. This resulted in N=6,152 possible combinations for 9,058 samples. For the identification of replicate tumor-normal pairs we had to re-do the sample selection procedure but omitting to drop replicate samples prior to matching tumors to controls. After pairing we identified n=3,554 tumor/normal pairs with replicates. These tumor/normal pairs were then combined much like was done with the replicate samples, except that pairs mostly had one (N=1,555 pairs) and at most two (N=148 pairs) replicate partners. This then resulted in N=1,999 possible combinations for N=3,554 tumor/normal pairs. In order to simplify the interpretation of the replicate analysis we then applied the following filters to both samples and pairs. 1. Only replicates originating from a single vial of DNA were considered (replicate samples only) 2. Replicates from the same center and same sequencing method were excluded 3. Replicates from different centers and different sequencing method were also excluded 4. Multiple replicates from the same sample were excluded This filtering resulted in a total of N=3,883 unpaired replicates and N=1,861 replicate tumor normal pairs. Inter and intra sequencing method replicate correlation varied, with the highest correlations occurring for whole-genome sequencing based replicates between different centers and between replicates from comparing low-pass and whole-genome sequencing. Replicates between whole-genome and wholeexome and between low-pass and whole-exome sequencing showed significant but poor correlations, suggesting whole-genome and low-pass based TL estimates are more robust compared to whole-exome 4

14 based estimates. Previously published comparisons between whole-genome and whole-exome based TL measurements 6,7 have demonstrated better correlations than found in our dataset, possibly due to better controlled and selected datasets. In general, tumor/normal pair replicates showed better correlations than individual samples. Telomere length expression associations This analysis was performed using median-centered gene expression and median-centered absolute TL in normal and tumor samples separately, as well as using median-centered gene expression and TL ratio. A spearman correlation and P-value was calculated for each gene across all samples and within diseases. Genes were annotated for chromosome, arm, and distance to telomere in bp. In the disease specific analysis using the TL ratio, we took the sum of all the disease-specific negative log10 P-values multiplied by the effect direction for each gene in order to rank genes. The top and bottom 500 genes in this list were used to obtain pathway associations in MSigDB 8 using the Reactome 9 database for genes associated with longer and shorter telomeres, respectively. In the overall analysis using absolute median-centered TL in normal and tumor samples we computed the Benjamini-Hochberg false discovery rate to adjust P-values for multiple testing. Genes with an FDR < 0.10 were dropped. This analysis was limited to genes on chromosome arms longer than or equal to 1e8 bp in length, namely: 1p, 1q, 2q, 3q, 4q, 5q, 6q and 8q. Loess regression was used to model the correlation Rho to distance to telomere in tumor samples, and the median expression in tumor and normal samples. Telomerase activity signature We propose to define a gene signature to predict telomerase activity from gene expression data. In principle the aggregated expression of the signature should be correlated with telomerase activity and distinguishes high from low telomerase cancers. To define such a signature, we first tested the NCI60 cell lines hoping to find telomerase positive and negative lines. We searched the literature for each line and found all but one line are telomerase positive. The only exception is the breast cancer line Hs578T (Irving et al., 2004). Lack of telomerase negative lines thus precludes NCI60 for this analysis purpose. We then queried the public Gene Expression Ominibus (GEO, 10 and identified two published datasets, GSE14533 and GSE ,12. The GSE14533 dataset, also referred to as LWK dataset, encompasses 18 ALT samples (8 cell lines and 10 cell lines) and 16 telomerase samples (8 cell lines and 8 tumors). All tumors are liposarcoma with unknown histology. The GSE20559 dataset, referred to as Doyle dataset, encompasses 4 telomerase positive and 4 ALT samples. All 8 samples in Doyle set are dedifferentiated liposarcoma thus the comparison eludes the potential confounding effect of tumor histology. We downloaded both datasets and re-processed them. For Doyle set we performed signal extraction/normalization using the aroma package ( For LWK we normalized its two subseries by z-score transformation to reduce potential batch effects. We compared ALT and telomerase active samples with Significance Analysis of Microarrays implemented in the R package siggenes ( Using a cutoff of FDR 0.05 and Fold Change (FC) 1.5 we identified 666 up regulated genes in the 5

15 telomerase positive samples in the LWK dataset (named as LWK_TELUP signature). Similarly using a cutoff of p value 0.05 and FC 1.5 we found 225 genes up regulated in the four telomerase positive tumors (named as Doyle_DD_TELUP). We note that we did not use FDR to filter the Doyle dataset because no gene passed FDR 0.1, likely because of a very small sample size (n=8). This is consistent with the original report by the group (Doyle et al., 2012). In addition, we obtained a signature of 14 genes from the Doyle et al., which was reported to differentiate telomerase positive tumors from ALT tumors 12. We named this signature as Doyle_DD_G14. We reasoned that despite the defined phenotypes of the samples (ALT vs telomerase), the resulting signatures might contain signals irrelevant to telomerase activity. Embryonic stem cell (ESC) is another cell lineage that has positive telomerase activity, so by overlapping genes up regulated in embryonic stem cells presumably reduce irrelevant signals. We intersected the aforementioned gene signatures with a list of 420 genes up regulated in ESCs curated by an independent study 13 and found only a few genes left. We thus further loosened the criteria using FC 1.5 for both LWK and Doyle datasets. The intersection led to a gene signature of 43 genes for Doyle data set (named as Doyle_DD_TELUP_ESC) and a gene signature of 41 genes for LWK dataset (named as LWK_TELUP_ESC). In parallel with the signatures from unsupervised approaches, we searched the MSigDB resource ( and included telomerase related gene sets icluding BIOCARTA_TEL_PATHWAY from Biocarta, REACTOME_TELOMERE_MAINTENANCE from Reactome, and positive regulation of telomere maintenance via telomerase (GO: ) from Gene Ontology. In total we have 8 gene signatures for testing. To test these signatures, we applied them to a set of urothelial cancer cell lines using single sample gene set enrichment analysis 14. This method has been used in our previous publications to characterize parenchymal cells of the brain in the four expression subtypes of glioblastoma 14 and steroidogenic enzymes in adrenocortical carcinoma 6. Telomerase enzymatic activity of 23 urothelial cancer cell lines has been recently measured and correlated to TERT expression (by RT-PCR) 15. This is a very useful data because telomerase activity was quantified as continuous values instead of dichotomous positive or negative. Of the 23 cell lines 11 are included in Cancer Cell Line Encyclopedia (CCLE) 16 and have RNAseq data available. We applied the 8 signatures to these 11 lines and correlated the result signature scores to the experimentally determined enzymatic activity of telomerase kindly provided by the authors. Interestingly the Doyle_DD_TELUP_ESC signature outperformed other signatures by showing an impressive correlation coefficient r=0.58 (p=0.07, Spearman correlation) in spite of a small sample size (n=11) (Supplementary Figure 8a), suggesting it provides a reasonable accuracy in predicting telomerase enzymatic activity. None of TERT, TERC, DKC1, TERF1 or TERF2 is a member of this signature. It is worth noting that the Doyle dataset is based on Affymetrix HG-U133plus2.0 platform whereas the CCLE expression data is RNA sequencing based. In the same array based NCI60 dataset, the telomerase negative Hs578T line demonstrated the lowest score (1703 vs 2993 by average). Supplementary References 6

16 1. Torres-Garcia, W. et al. PRADA: pipeline for RNA sequencing data analysis. Bioinformatics 30, (2014). 2. Yoshihara, K. et al. The landscape and therapeutic relevance of cancer-associated transcript fusions. Oncogene 34, (2015). 3. Vogelstein, B. et al. Cancer genome landscapes. Science 339, (2013). 4. Mermel, C.H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 12, R41 (2011). 5. Wilks, C. et al. The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data. Database (Oxford) 2014(2014). 6. Zheng, S. et al. Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma. Cancer Cell 29, (2016). 7. Ding, Z. et al. Estimating telomere length from whole genome sequence data. Nucleic Acids Res 42, e75 (2014). 8. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, (2005). 9. Milacic, M. et al. Annotating cancer variants and anti-cancer therapeutics in reactome. Cancers (Basel) 4, (2012). 10. Barrett, T. et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res 41, D991-5 (2013). 11. Lafferty-Whyte, K. et al. A gene expression signature classifying telomerase and ALT immortalization reveals an htert regulatory network and suggests a mesenchymal stem cell origin for ALT. Oncogene 28, (2009). 12. Doyle, K.R. et al. Validating a gene expression signature proposed to differentiate liposarcomas that use different telomere maintenance mechanisms. Oncogene 31, 265-6; author reply (2012). 13. Ben-Porath, I. et al. An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat Genet 40, (2008). 14. Verhaak, R.G. et al. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17, (2010). 15. Borah, S. et al. Cancer. TERT promoter mutations and telomerase reactivation in urothelial cancer. Science 347, (2015). 16. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, (2012). 7

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from Supplementary Figure 1 SEER data for male and female cancer incidence from 1975 2013. (a,b) Incidence rates of oral cavity and pharynx cancer (a) and leukemia (b) are plotted, grouped by males (blue),

More information

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity A Consistently unmethylated sites (30%) in 21 cancer types 174,696

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Workflow of CDR3 sequence assembly from RNA-seq data.

Nature Genetics: doi: /ng Supplementary Figure 1. Workflow of CDR3 sequence assembly from RNA-seq data. Supplementary Figure 1 Workflow of CDR3 sequence assembly from RNA-seq data. Paired-end short-read RNA-seq data were mapped to human reference genome hg19, and unmapped reads in the TCR regions were extracted

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Pan-cancer analysis of global and local DNA methylation variation a) Variations in global DNA methylation are shown as measured by averaging the genome-wide

More information

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis Tieliu Shi tlshi@bio.ecnu.edu.cn The Center for bioinformatics

More information

Nature Getetics: doi: /ng.3471

Nature Getetics: doi: /ng.3471 Supplementary Figure 1 Summary of exome sequencing data. ( a ) Exome tumor normal sample sizes for bladder cancer (BLCA), breast cancer (BRCA), carcinoid (CARC), chronic lymphocytic leukemia (CLLX), colorectal

More information

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute Whole Genome and Transcriptome Analysis of Anaplastic Meningioma Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute Outline Anaplastic meningioma compared to other cancers Whole genomes

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

User s Manual Version 1.0

User s Manual Version 1.0 User s Manual Version 1.0 #639 Longmian Avenue, Jiangning District, Nanjing,211198,P.R.China. http://tcoa.cpu.edu.cn/ Contact us at xiaosheng.wang@cpu.edu.cn for technical issue and questions Catalogue

More information

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed. Supplementary Note The potential association and implications of HBV integration at known and putative cancer genes of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed. Human telomerase

More information

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer Genetic alterations of histone lysine methyltransferases and their significance in breast cancer Supplementary Materials and Methods Phylogenetic tree of the HMT superfamily The phylogeny outlined in the

More information

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser Melissa S. Cline 1*, Brian Craft 1, Teresa Swatloski 1, Mary Goldman 1, Singer Ma 1, David Haussler 1, Jingchun Zhu 1 1 Center for Biomolecular

More information

TCGA. The Cancer Genome Atlas

TCGA. The Cancer Genome Atlas TCGA The Cancer Genome Atlas TCGA: History and Goal History: Started in 2005 by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) with $110 Million to catalogue

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers Sung-Hou Kim University of California Berkeley, CA Global Bio Conference 2017 MFDS, Seoul, Korea June 28, 2017 Cancer

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Length of mrna transcripts in the human genome 5,000 5,000 4,000 3,000 2,000 4,000 1,000 0 0 200 400 600 800 3,000 2,000 1,000 0 0 2,000 4,000 6,000 8,000 10,000 Length

More information

SUPPLEMENTARY FIGURES: Supplementary Figure 1

SUPPLEMENTARY FIGURES: Supplementary Figure 1 SUPPLEMENTARY FIGURES: Supplementary Figure 1 Supplementary Figure 1. Glioblastoma 5hmC quantified by paired BS and oxbs treated DNA hybridized to Infinium DNA methylation arrays. Workflow depicts analytic

More information

LinkedOmics. A Web-based platform for analyzing cancer-associated multi-dimensional data. Manual. First edition 4 April 2017 Updated on July 3, 2017

LinkedOmics. A Web-based platform for analyzing cancer-associated multi-dimensional data. Manual. First edition 4 April 2017 Updated on July 3, 2017 LinkedOmics A Web-based platform for analyzing cancer-associated multi-dimensional data Manual First edition 4 April 2017 Updated on July 3, 2017 LinkedOmics is a publicly available portal (http://linkedomics.org/)

More information

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R Frequency(%) 1 a b ALK FS-indel ALK R1Q HRAS Q61R HRAS G13R IDH R17K IDH R14Q MET exon14 SS-indel KIT D8Y KIT L76P KIT exon11 NFS-indel SMAD4 R361 IDH1 R13 CTNNB1 S37 CTNNB1 S4 AKT1 E17K ERBB D769H ERBB

More information

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded from TCGA and differentially expressed metabolic genes

More information

Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt, Weizhou Zhang 1-4

Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt, Weizhou Zhang 1-4 SOFTWARE TOOL ARTICLE TRGAted: A web tool for survival analysis using protein data in the Cancer Genome Atlas. [version 1; referees: 1 approved] Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt,

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases.

Nature Genetics: doi: /ng Supplementary Figure 1. Somatic coding mutations identified by WES/WGS for 83 ATL cases. Supplementary Figure 1 Somatic coding mutations identified by WES/WGS for 83 ATL cases. (a) The percentage of targeted bases covered by at least 2, 10, 20 and 30 sequencing reads (top) and average read

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature10866 a b 1 2 3 4 5 6 7 Match No Match 1 2 3 4 5 6 7 Turcan et al. Supplementary Fig.1 Concepts mapping H3K27 targets in EF CBX8 targets in EF H3K27 targets in ES SUZ12 targets in ES

More information

Supplementary Figure 1. Copy Number Alterations TP53 Mutation Type. C-class TP53 WT. TP53 mut. Nature Genetics: doi: /ng.

Supplementary Figure 1. Copy Number Alterations TP53 Mutation Type. C-class TP53 WT. TP53 mut. Nature Genetics: doi: /ng. Supplementary Figure a Copy Number Alterations in M-class b TP53 Mutation Type Recurrent Copy Number Alterations 8 6 4 2 TP53 WT TP53 mut TP53-mutated samples (%) 7 6 5 4 3 2 Missense Truncating M-class

More information

SUPPLEMENTARY APPENDIX

SUPPLEMENTARY APPENDIX SUPPLEMENTARY APPENDIX 1) Supplemental Figure 1. Histopathologic Characteristics of the Tumors in the Discovery Cohort 2) Supplemental Figure 2. Incorporation of Normal Epidermal Melanocytic Signature

More information

Cancer Informatics Lecture

Cancer Informatics Lecture Cancer Informatics Lecture Mayo-UIUC Computational Genomics Course June 22, 2018 Krishna Rani Kalari Ph.D. Associate Professor 2017 MFMER 3702274-1 Outline The Cancer Genome Atlas (TCGA) Genomic Data Commons

More information

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies. 2014. Supplemental Digital Content 1. Appendix 1. External data-sets used for associating microrna expression with lung squamous cell

More information

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland Expert-guided Visual Exploration (EVE) for patient stratification Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland Oncoscape.sttrcancer.org Paul Lisa Ken Jenny Desert Eric The challenge Given - patient clinical

More information

Distinct cellular functional profiles in pan-cancer expression analysis of cancers with alterations in oncogenes c-myc and n-myc

Distinct cellular functional profiles in pan-cancer expression analysis of cancers with alterations in oncogenes c-myc and n-myc Honors Theses Biology Spring 2018 Distinct cellular functional profiles in pan-cancer expression analysis of cancers with alterations in oncogenes c-myc and n-myc Anne B. Richardson Whitman College Penrose

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

TCGA-Assembler: Pipeline for TCGA Data Downloading, Assembling, and Processing. (Supplementary Methods)

TCGA-Assembler: Pipeline for TCGA Data Downloading, Assembling, and Processing. (Supplementary Methods) TCGA-Assembler: Pipeline for TCGA Data Downloading, Assembling, and Processing (Supplementary Methods) Yitan Zhu 1, Peng Qiu 2, Yuan Ji 1,3 * 1. Center for Biomedical Research Informatics, NorthShore University

More information

The Cancer Genome Atlas & International Cancer Genome Consortium

The Cancer Genome Atlas & International Cancer Genome Consortium The Cancer Genome Atlas & International Cancer Genome Consortium Session 3 Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 31 st July 2014 1

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Download slides and package http://odin.mdacc.tmc.edu/~rverhaak/package.zip http://odin.mdacc.tmc.edu/~rverhaak/rna-seqlecture.zip Overview Introduction into the topic

More information

Nature Medicine: doi: /nm.3967

Nature Medicine: doi: /nm.3967 Supplementary Figure 1. Network clustering. (a) Clustering performance as a function of inflation factor. The grey curve shows the median weighted Silhouette widths for varying inflation factors (f [1.6,

More information

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. Supplementary Figure 1 Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. (a,b) Values of coefficients associated with genomic features, separately

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics Precision Genomics for Immuno-Oncology Personalis, Inc. ACE ImmunoID When one biomarker doesn t tell the whole

More information

Supplementary Figure 1. Spitzoid Melanoma with PPFIBP1-MET fusion. (a) Histopathology (4x) shows a domed papule with melanocytes extending into the

Supplementary Figure 1. Spitzoid Melanoma with PPFIBP1-MET fusion. (a) Histopathology (4x) shows a domed papule with melanocytes extending into the Supplementary Figure 1. Spitzoid Melanoma with PPFIBP1-MET fusion. (a) Histopathology (4x) shows a domed papule with melanocytes extending into the deep dermis. (b) The melanocytes demonstrate abundant

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Expression deviation of the genes mapped to gene-wise recurrent mutations in the TCGA breast cancer cohort (top) and the TCGA lung cancer cohort (bottom). For each gene (each pair

More information

Expanded View Figures

Expanded View Figures Solip Park & Ben Lehner Epistasis is cancer type specific Molecular Systems Biology Expanded View Figures A B G C D E F H Figure EV1. Epistatic interactions detected in a pan-cancer analysis and saturation

More information

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies 2017 Contents Datasets... 2 Protein-protein interaction dataset... 2 Set of known PPIs... 3 Domain-domain interactions...

More information

Genomic and Functional Approaches to Understanding Cancer Aneuploidy

Genomic and Functional Approaches to Understanding Cancer Aneuploidy Article Genomic and Functional Approaches to Understanding Cancer Aneuploidy Graphical Abstract Cancer-Type Specific Aneuploidy Patterns in TCGA Samples CRISPR Transfection and Selection Immortalized Cell

More information

NGS in tissue and liquid biopsy

NGS in tissue and liquid biopsy NGS in tissue and liquid biopsy Ana Vivancos, PhD Referencias So, why NGS in the clinics? 2000 Sanger Sequencing (1977-) 2016 NGS (2006-) ABIPrism (Applied Biosystems) Up to 2304 per day (96 sequences

More information

ncounter Assay Automated Process Immobilize and align reporter for image collecting and barcode counting ncounter Prep Station

ncounter Assay Automated Process Immobilize and align reporter for image collecting and barcode counting ncounter Prep Station ncounter Assay ncounter Prep Station Automated Process Hybridize Reporter to RNA Remove excess reporters Bind reporter to surface Immobilize and align reporter Image surface Count codes Immobilize and

More information

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis Shisong Ma 1,2*, Michael Snyder 3, and Savithramma P Dinesh-Kumar 2* 1 School of Life Sciences, University

More information

Session 4 Rebecca Poulos

Session 4 Rebecca Poulos The Cancer Genome Atlas (TCGA) & International Cancer Genome Consortium (ICGC) Session 4 Rebecca Poulos Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 28

More information

Detecting gene signature activation in breast cancer in an absolute, single-patient manner

Detecting gene signature activation in breast cancer in an absolute, single-patient manner Paquet et al. Breast Cancer Research (2017) 19:32 DOI 10.1186/s13058-017-0824-7 RESEARCH ARTICLE Detecting gene signature activation in breast cancer in an absolute, single-patient manner E. R. Paquet

More information

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Kaifu Chen 1,2,3,4,5,10, Zhong Chen 6,10, Dayong Wu 6, Lili Zhang 7, Xueqiu Lin 1,2,8,

More information

LncRNA TUSC7 affects malignant tumor prognosis by regulating protein ubiquitination: a genome-wide analysis from 10,237 pancancer

LncRNA TUSC7 affects malignant tumor prognosis by regulating protein ubiquitination: a genome-wide analysis from 10,237 pancancer Original Article LncRNA TUSC7 affects malignant tumor prognosis by regulating protein ubiquitination: a genome-wide analysis from 10,237 pancancer patients Xiaoshun Shi 1 *, Yusong Chen 2,3 *, Allen M.

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

Supplementary Materials for

Supplementary Materials for www.sciencetranslationalmedicine.org/cgi/content/full/7/283/283ra54/dc1 Supplementary Materials for Clonal status of actionable driver events and the timing of mutational processes in cancer evolution

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

Supplemental Information. Molecular, Pathological, Radiological, and Immune. Profiling of Non-brainstem Pediatric High-Grade

Supplemental Information. Molecular, Pathological, Radiological, and Immune. Profiling of Non-brainstem Pediatric High-Grade Cancer Cell, Volume 33 Supplemental Information Molecular, Pathological, Radiological, and Immune Profiling of Non-brainstem Pediatric High-Grade Glioma from the HERBY Phase II Randomized Trial Alan Mackay,

More information

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles Ying-Wooi Wan 1,2,4, Claire M. Mach 2,3, Genevera I. Allen 1,7,8, Matthew L. Anderson 2,4,5 *, Zhandong Liu 1,5,6,7 * 1 Departments of Pediatrics

More information

Session 4 Rebecca Poulos

Session 4 Rebecca Poulos The Cancer Genome Atlas (TCGA) & International Cancer Genome Consortium (ICGC) Session 4 Rebecca Poulos Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 20

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075 Aiello & Alter (216) PLoS One vol. 11 no. 1 e164546 S1 Appendix A-1 S1 Appendix: Figs A G and Table A a Tumor Generalized Fraction b Normal Generalized Fraction.25.5.75.25.5.75 1 53 4 59 2 58 8 57 3 48

More information

Ahrim Youn 1,2, Kyung In Kim 2, Raul Rabadan 3,4, Benjamin Tycko 5, Yufeng Shen 3,4,6 and Shuang Wang 1*

Ahrim Youn 1,2, Kyung In Kim 2, Raul Rabadan 3,4, Benjamin Tycko 5, Yufeng Shen 3,4,6 and Shuang Wang 1* Youn et al. BMC Medical Genomics (2018) 11:98 https://doi.org/10.1186/s12920-018-0425-z RESEARCH ARTICLE Open Access A pan-cancer analysis of driver gene mutations, DNA methylation and gene expressions

More information

File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables. File Name: Peer Review File Description:

File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables. File Name: Peer Review File Description: File Name: Supplementary Information Description: Supplementary Figures and Supplementary Tables File Name: Peer Review File Description: Primer Name Sequence (5'-3') AT ( C) RT-PCR USP21 F 5'-TTCCCATGGCTCCTTCCACATGAT-3'

More information

Supplementary Tables. Supplementary Figures

Supplementary Tables. Supplementary Figures Supplementary Files for Zehir, Benayed et al. Mutational Landscape of Metastatic Cancer Revealed from Prospective Clinical Sequencing of 10,000 Patients Supplementary Tables Supplementary Table 1: Sample

More information

Expanded View Figures

Expanded View Figures Molecular Systems iology Tumor CNs reflect metabolic selection Nicholas Graham et al Expanded View Figures Human primary tumors CN CN characterization by unsupervised PC Human Signature Human Signature

More information

SUPPLEMENTAL INFORMATION

SUPPLEMENTAL INFORMATION SUPPLEMENTAL INFORMATION GO term analysis of differentially methylated SUMIs. GO term analysis of the 458 SUMIs with the largest differential methylation between human and chimp shows that they are more

More information

Nature Structural & Molecular Biology: doi: /nsmb.2419

Nature Structural & Molecular Biology: doi: /nsmb.2419 Supplementary Figure 1 Mapped sequence reads and nucleosome occupancies. (a) Distribution of sequencing reads on the mouse reference genome for chromosome 14 as an example. The number of reads in a 1 Mb

More information

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits Next-generation performance in liquid biopsies 2 Accelerating clinical research From liquid biopsy to next-generation

More information

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space Whole genome sequencing Whole exome sequencing BWA alignment to reference transcriptome and genome Convert transcriptome mappings back to genome space genomes Filter on MQ, distance, Cigar string Annotate

More information

Introduction to LOH and Allele Specific Copy Number User Forum

Introduction to LOH and Allele Specific Copy Number User Forum Introduction to LOH and Allele Specific Copy Number User Forum Jonathan Gerstenhaber Introduction to LOH and ASCN User Forum Contents 1. Loss of heterozygosity Analysis procedure Types of baselines 2.

More information

Clustered mutations of oncogenes and tumor suppressors.

Clustered mutations of oncogenes and tumor suppressors. Supplementary Figure 1 Clustered mutations of oncogenes and tumor suppressors. For each oncogene (red dots) and tumor suppressor (blue dots), the number of mutations found in an intramolecular cluster

More information

LncMAP: Pan-cancer atlas of long noncoding RNA-mediated transcriptional network perturbations

LncMAP: Pan-cancer atlas of long noncoding RNA-mediated transcriptional network perturbations Published online 9 January 2018 Nucleic Acids Research, 2018, Vol. 46, No. 3 1113 1123 doi: 10.1093/nar/gkx1311 LncMAP: Pan-cancer atlas of long noncoding RNA-mediated transcriptional network perturbations

More information

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni. Supplementary Figure 1 Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Expression of Mll4 floxed alleles (16-19) in naive CD4 + T cells isolated from lymph nodes and

More information

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. Supplementary Figure Legends Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. A. Screenshot of the UCSC genome browser from normalized RNAPII and RNA-seq ChIP-seq data

More information

Introduction to Gene Sets Analysis

Introduction to Gene Sets Analysis Introduction to Svitlana Tyekucheva Dana-Farber Cancer Institute May 15, 2012 Introduction Various measurements: gene expression, copy number variation, methylation status, mutation profile, etc. Main

More information

Supplemental Information For: The genetics of splicing in neuroblastoma

Supplemental Information For: The genetics of splicing in neuroblastoma Supplemental Information For: The genetics of splicing in neuroblastoma Justin Chen, Christopher S. Hackett, Shile Zhang, Young K. Song, Robert J.A. Bell, Annette M. Molinaro, David A. Quigley, Allan Balmain,

More information

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits Accelerating clinical research Next-generation sequencing (NGS) has the ability to interrogate many different genes and detect

More information

Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs.

Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs. Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs. (a) CNA analysis of expression microarray data obtained from 15 tumors in the SV40Tag

More information

Computational Investigation of Homologous Recombination DNA Repair Deficiency in Sporadic Breast Cancer

Computational Investigation of Homologous Recombination DNA Repair Deficiency in Sporadic Breast Cancer University of Massachusetts Medical School escholarship@umms Open Access Articles Open Access Publications by UMMS Authors 11-16-2017 Computational Investigation of Homologous Recombination DNA Repair

More information

Supplementary Information Titles Journal: Nature Medicine

Supplementary Information Titles Journal: Nature Medicine Supplementary Information Titles Journal: Nature Medicine Article Title: Corresponding Author: Supplementary Item & Number Supplementary Fig.1 Fig.2 Fig.3 Fig.4 Fig.5 Fig.6 Fig.7 Fig.8 Fig.9 Fig. Fig.11

More information

DNA-seq Bioinformatics Analysis: Copy Number Variation

DNA-seq Bioinformatics Analysis: Copy Number Variation DNA-seq Bioinformatics Analysis: Copy Number Variation Elodie Girard elodie.girard@curie.fr U900 institut Curie, INSERM, Mines ParisTech, PSL Research University Paris, France NGS Applications 5C HiC DNA-seq

More information

Journal: Nature Methods

Journal: Nature Methods Journal: Nature Methods Article Title: Network-based stratification of tumor mutations Corresponding Author: Trey Ideker Supplementary Item Supplementary Figure 1 Supplementary Figure 2 Supplementary Figure

More information

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA LEO TUNKLE *

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA   LEO TUNKLE * CERNA SEARCH METHOD IDENTIFIED A MET-ACTIVATED SUBGROUP AMONG EGFR DNA AMPLIFIED LUNG ADENOCARCINOMA PATIENTS HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA Email:

More information

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA www.impactjournals.com/oncotarget/ Oncotarget, Supplementary Materials 2016 Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) DNA Supplementary Materials

More information

ncounter Assay Automated Process Capture & Reporter Probes Bind reporter to surface Remove excess reporters Hybridize CodeSet to RNA

ncounter Assay Automated Process Capture & Reporter Probes Bind reporter to surface Remove excess reporters Hybridize CodeSet to RNA ncounter Assay Automated Process Hybridize CodeSet to RNA Remove excess reporters Bind reporter to surface Immobilize and align reporter Image surface Count codes mrna Capture & Reporter Probes slides

More information

Nature Medicine: doi: /nm.4439

Nature Medicine: doi: /nm.4439 Figure S1. Overview of the variant calling and verification process. This figure expands on Fig. 1c with details of verified variants identification in 547 additional validation samples. Somatic variants

More information

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq) RNA sequencing (RNA-seq) Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome

More information

Supplementary Appendix

Supplementary Appendix Supplementary Appendix This appendix has been provided by the authors to give readers additional information about their work. Supplement to: Eckel-Passow JE, Lachance DH, Molinaro AM, et al. Glioma groups

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Country distribution of GME samples and designation of geographical subregions.

Nature Genetics: doi: /ng Supplementary Figure 1. Country distribution of GME samples and designation of geographical subregions. Supplementary Figure 1 Country distribution of GME samples and designation of geographical subregions. GME samples collected across 20 countries and territories from the GME. Pie size corresponds to the

More information

Contents. 1.5 GOPredict is robust to changes in study sets... 5

Contents. 1.5 GOPredict is robust to changes in study sets... 5 Supplementary documentation for Data integration to prioritize drugs using genomics and curated data Riku Louhimo, Marko Laakso, Denis Belitskin, Juha Klefström, Rainer Lehtonen and Sampsa Hautaniemi Faculty

More information

Tissue-specific DNA methylation loss during ageing and carcinogenesis is linked to chromosome structure, replication timing and cell division rates

Tissue-specific DNA methylation loss during ageing and carcinogenesis is linked to chromosome structure, replication timing and cell division rates 7022 7039 Nucleic Acids Research, 2018, Vol. 46, No. 14 Published online 9 June 2018 doi: 10.1093/nar/gky498 Tissue-specific DNA methylation loss during ageing and carcinogenesis is linked to chromosome

More information

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first intron IGLL5 mutation depicting biallelic mutations. Red arrows highlight the presence of out of phase

More information

Nature Genetics: doi: /ng Supplementary Figure 1. HOX fusions enhance self-renewal capacity.

Nature Genetics: doi: /ng Supplementary Figure 1. HOX fusions enhance self-renewal capacity. Supplementary Figure 1 HOX fusions enhance self-renewal capacity. Mouse bone marrow was transduced with a retrovirus carrying one of three HOX fusion genes or the empty mcherry reporter construct as described

More information

Nature Biotechnology: doi: /nbt.1904

Nature Biotechnology: doi: /nbt.1904 Supplementary Information Comparison between assembly-based SV calls and array CGH results Genome-wide array assessment of copy number changes, such as array comparative genomic hybridization (acgh), is

More information

Pan-cancer analysis of expressed somatic nucleotide variants in long intergenic non-coding RNA

Pan-cancer analysis of expressed somatic nucleotide variants in long intergenic non-coding RNA Pan-cancer analysis of expressed somatic nucleotide variants in long intergenic non-coding RNA Travers Ching 1,2, Lana X. Garmire 1,2 1 Molecular Biosciences and Bioengineering Graduate Program, University

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma. Supplementary Figure 1 Mutational signatures in BCC compared to melanoma. (a) The effect of transcription-coupled repair as a function of gene expression in BCC. Tumor type specific gene expression levels

More information

Supplementary Figure 1. Estimation of tumour content

Supplementary Figure 1. Estimation of tumour content Supplementary Figure 1. Estimation of tumour content a, Approach used to estimate the tumour content in S13T1/T2, S6T1/T2, S3T1/T2 and S12T1/T2. Tissue and tumour areas were evaluated by two independent

More information

Supplementary Figure 1: High-throughput profiling of survival after exposure to - radiation. (a) Cells were plated in at least 7 wells in a 384-well

Supplementary Figure 1: High-throughput profiling of survival after exposure to - radiation. (a) Cells were plated in at least 7 wells in a 384-well Supplementary Figure 1: High-throughput profiling of survival after exposure to - radiation. (a) Cells were plated in at least 7 wells in a 384-well plate at cell densities ranging from 25-225 cells in

More information

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017 Epigenetics Jenny van Dongen Vrije Universiteit (VU) Amsterdam j.van.dongen@vu.nl Boulder, Friday march 10, 2017 Epigenetics Epigenetics= The study of molecular mechanisms that influence the activity of

More information

Tissue of origin determines cancer-associated CpG island promoter hypermethylation patterns

Tissue of origin determines cancer-associated CpG island promoter hypermethylation patterns RESEARCH Open Access Tissue of origin determines cancer-associated CpG island promoter hypermethylation patterns Duncan Sproul 1,2, Robert R Kitchen 1,3, Colm E Nestor 1,2, J Michael Dixon 1, Andrew H

More information

underlying metastasis and recurrence in HNSCC, we analyzed two groups of patients. The

underlying metastasis and recurrence in HNSCC, we analyzed two groups of patients. The Supplementary Figures Figure S1. Patient cohorts and study design. To define and interrogate the genetic alterations underlying metastasis and recurrence in HNSCC, we analyzed two groups of patients. The

More information

Plasma-Seq conducted with blood from male individuals without cancer.

Plasma-Seq conducted with blood from male individuals without cancer. Supplementary Figures Supplementary Figure 1 Plasma-Seq conducted with blood from male individuals without cancer. Copy number patterns established from plasma samples of male individuals without cancer

More information