SUPPLEMENTARY METHODS

Size: px
Start display at page:

Download "SUPPLEMENTARY METHODS"

Transcription

1 SUPPLEMENTARY METHODS Data Contents and Sources OASIS captured sample-level annotations and three omics data types - somatic mutation, copy number variation (CNV) and gene expression based on microarray or RNA-Seq data (Supplementary Table 1). Alterations were reported for individual genes and samples or summarized for individual genes and cancer types. Gene-level copy number was calculated and reported for each sample based on a weighted average of copy number values for all overlapping CNV segments. Gene-level expression was shown as log2 intensity for microarray data and TPM (transcripts per million) for RNA-Seq data calculated by the RSEM package 1. All mutations were consistently annotated using a custom pipeline based on the Variant Effect Predictor (VEP) 2 with additional information derived from comparisons with dbsnp 3 and COSMIC 4 databases. Disease descriptions of tumor samples were manually curated to consistently enforce a single controlled vocabulary across different datasets (Supplementary table 2). Differential and outlier gene expression analyses were performed on primary tumor datasets with expression data available for 20 normal samples. Most datasets consisted of samples from a single cancer origin except for CCLE and GTEx which are multi-cancer datasets. OASIS 1.0 contained data on 12,108 tumor, 13,007 normal and 1,054 cell line samples across 55 cancer types from TCGA (The Cancer Genome Atlas: CCLE (Cancer Cell Line Encyclopedia) 5, GTEx (Genotype-Tissue Expression) 6 and four Pfizer-funded genomics studies of liver, gastric and breast cancers 7-10 (Supplementary Table 1). Analysis result files including CNV, Mutation and RNA-Seq Expression data for TCGA and GTEx were provided by OmicSoft Corporation s OncoLand data service. CCLE RNA-Seq data was downloaded from the UCSC CGHub 11. Three gene lists that represent potential drug targets surfaceome, secretome and immunome - were obtained from publications and The Human Protein Atlas ( RNA-Seq Analysis RNA-Seq data for the CCLE dataset was analyzed using the RSEM algorithm 1 to quantify gene expression. Fastq files for each sample were extracted from BAM files using Picard s SamToFastq function ( RSEM (v ) was applied to the fastq paired-end read data using human RefSeq transcripts and GRCh37 genome assembly as references. Gene expression levels in TPM were reported by OASIS. For the TCGA dataset, TPM values were derived by applying the formula TPM = scaled estimate x 10 6 on scaled estimate data from the firehose pipeline output ( Copy Number Analysis CNV segments with lengths <1 Mb were defined as focal and CNV segments 1 Mb in length were defined as broad. CNV calls were further classified into 5 categories based on the copy number value: Amplification (CN 3.7), Copy gain (3.7 > CN 2.5), Neutral (2.5 > CN > 1.5), Copy loss (1.5 CN > 1.2) and Deletion (CN 1.2). Gene level CN for each sample was calculated based on a weighted average of CN values for all overlapping CN segments: Gene CN = (C*E) where C represents copy number of the CNV segment and E represents the percentage of exon sequence overlapping the CNV segment. Genomic regions that harbor recurrent and high-level CNV in a cohort of samples were identified using the GISTIC v.2 algorithm 17. Mutation Annotation

2 Mutations were annotated using the EnsEMBL gene set release and the Variant Effect Predictor (VEP) 2. For each mutation, transcript level annotations from VEP were filtered to select the most deleterious annotation at the gene level based on the mutation functional effect and functional consequence as predicted by SIFT 19. For those transcripts with the same predicted functional state, the longer coding transcript was selected. COSMIC (v.66) identifiers and sample counts for sample entries with confirmed somatic status were provided for those mutations with matching coordinates and mutant allele with a COSMIC entry. The dbsnp (v.137) 3 identifier and allele frequency were also annotated for those mutations having a coordinate match with a dbsnp entry. Pfam 20 protein domains overlapping with mutations were annotated for those mutations. For each mutation, the number of samples affected in each cancer type and across all cell lines were reported as tumor and model recurrences respectively. Differential Expression and Outlier Analysis Microarray data were quantile normalized and differential expression was analyzed using the limma package in Bioconductor 21. Differential expression analysis for RNA-Seq data was performed using the edger package in Bioconductor 22. The false discovery rate (FDR) was calculated using the Benjamini-Hochberg method 23. Outlier analysis was calculated for gene expression data using the likelihood ratio method 24, which identifies a change point in the sorted standardized expression level of tumor samples while using normal samples as a reference. Samples above the change point were defined as up-regulated outliers. Downregulated outliers were also defined similarly. The p-value for the outlier statistic was calculated based on simulated null distribution from standard Gaussian, and Benjamini- Hochberg FDR was then calculated from the p-values. Outliers were defined only for genes with FDR<0.05 in the outlier analysis. Druggability Score The small molecule druggability score for a gene indicates the potential of developing a small molecule drug to functionally modulate the gene product. It was developed using a semi-supervised approach of protein domain classification. The method started by examining the molecular targets of all marketed and experimental drugs, as well as a large collection of drug-like tool compounds. Known protein domains from these targets capable of binding by small molecule compounds with high affinity were then learned in a semi-automated fashion. By carrying out the mapping of these druggable protein domains to the genome-wide scale, the method classified all human proteins into six categories of druggability 25 : 1 unknown druggability, 2 - has catalytic activity, 3 - has possibly druggable protein domain(s), 4 - has druggable protein domain(s), 5 - has high affinity drug-like compound(s) and 6 - target of launched drug(s). SUPPLEMENTARY NOTE Overview of Web Portal The OASIS web portal ( was developed based on a custom version of the BioMart framework designed for oncogenomics data analysis 26,27. All functionalities can be accessed through the menu bar at the top of the Home page (Supplementary Figure 1). Users can click on Data Summary to obtain an overview of all data organized hierarchically by disease, dataset and data type (Figure 1A). Gene Search allows users to enter a gene name such as HUGO gene symbol and retrieve a Gene report that includes a summary of the frequency with which the gene is affected by genomic alterations across different datasets and cancers (Supplementary Figure 2, Supplementary

3 Table 3). For a specific alteration type such as somatic mutation or copy number gain, the alteration frequency is calculated as the proportion of tumors in a cohort that harbor at least one alteration affecting the gene of interest. Database Search provides users with an easy-to-use interface to build custom queries against the entire database (Figure 1B, Supplementary Figure 3A). Users can specify various query criteria based on sample annotations, cancer alterations or results of pre-computed analyses such as differential expression in tumor relative to normal tissues. The query result, called an Alteration report, is returned in pre-defined or user-specified tabular format and exportable in Microsoft Excel-compatible format (Supplementary Figure 3B, Supplementary Table 4). Programmatic access is also available through REST and SOAP services. The Analysis section provides users with two analytical tools, Pan-cancer report and OASIS-print, to explore complex patterns of alterations affecting multiple genes. Plots including Bar, Box, Scatter and Volcano plots facilitate exploratory analysis of the data at the gene and sample level (Figures 1D-G). All plots have a common layout (Supplementary Figure 4) and functionalities such as zooming, in-plot search that can identify a gene or sample by name and multi-select that enables ad hoc selection of genes or samples by drawing a rectangle around plot elements. Users can mouse over plot elements such as samples to obtain a pop-up window containing detailed information and links to a Sample report, a summary of all the alterations identified for that sample, among other visualizations. Further, samples may be multi-selected to obtain sample names, alteration details and estimated prevalence as a percentage of the cohort. While several web portals have been developed to facilitate analyses of publicly available cancer genomic datasets, OASIS provides a unique resource combining the sheer scale of its data collection with unique datasets, analysis results such as differential expression and druggability score and novel analytical tools such as Pan-cancer report and Volcano plot (Supplementary Tables 5-6). OASIS enables analyses on one of the largest collections of multi-omics datasets on the web, with genomic-scale profiles on 26,169 samples (12,108 primary tumors, 13,007 normal samples and 1,054 cell lines), including RNA-Seq profiles on 12,056 samples. As a point of reference, the well-established cbioportal 31 has compiled data on 21,401 samples as of August, 2015, including RNA-Seq data on 10,459 samples. OASIS is also the first cancer genomics portal that hosts RNA-Seq data from CCLE 5 and GTEx 33 to our knowledge. A major strength of OASIS is the integrative analysis of RNA-Seq derived gene expression data with genomic alterations and across multiple studies. Features such as the Box plot were designed to integrate RNA-Seq expression data from multiple sources for comparison analyses of primary tumors with cell lines or comparing tumors with normal tissue. OASIS also focused on solving problems that arise in the drug discovery process. For example, based on open source code from cbioportal, OASIS-print was significantly modified to enable selection of cell line or xenograft models based on multi-gene expression and genomic characteristics, a common use case in oncology drug discovery. Pan-cancer Report The Pan-cancer report provides a graphical summary of alteration patterns affecting a list of genes and identifies genes frequently altered in one or multiple cancers (Figure 1C, Supplementary Figure 5). This visualization resembles a heatmap where rows represent individual genes and columns consist of two sections. The Gene Info section provides gene-level annotation including gene symbol, druggability score (Supplementary Methods), oncogene/tumor suppressor status as defined in Cancer Gene Census 34, and protein classification (secretome, immunome surfaceome). All other columns represent alteration frequencies grouped by primary tissues and datasets. The Summary columns report

4 frequencies for individual alteration types summarized across all samples. Columns are color-coded to represent different alteration types. Color gradients within columns represent alteration frequencies with higher intensities corresponding to higher frequencies. Users can toggle between the default color-only heatmap view and the detailed view that display the numerical values. Any heatmap cell can be clicked to retrieve the Alteration report. This tool could identify drug targets frequently altered in a disease or altered at low prevalence but across multiple cancers. For instance, we used the Pan-cancer report to reveal that MET, a gene from the c-met signaling pathway implicated in a variety of human malignancies 35, harbored high-level copy number alterations in ~2% of gastric cancer cases (Supplementary Figure 5). Copy Number Analyses using Bar Plot and Scatter Plot MET amplification is known to confer sensitivity to tyrosine kinase inhibitors in subsets of cancer patients 36. A Bar plot analysis of TCGA datasets confirmed that high-levels of MET amplification occur in multiple cancers (Supplementary Figure 6). A Scatter plot analysis showed that MET amplification induced over-expression in a subset of gastric cancer and lung adenocarcinoma cases with Pearson correlation coefficients of 0.83 and 0.65 respectively (Supplementary Figure 7A-C). Scatter plot analysis of the CCLE dataset suggested that gastric cancer cell lines such as SNU5 harbor both MET amplification and over-expression, and therefore could be used to experimentally test the effects of tyrosine kinase inhibition (Supplementary Figure 7D). Expression Analyses with Box Plot Box plots and Bar plots can be used to visualize gene expression values calculated from either microarray or RNA-Seq data. Gene expression values derived from RNA-Seq are comparable across multiple datasets as normalization of RNA-Seq data is largely dataset independent. By default, Box plots render a side-by-side comparison of gene expression values in tumor vs. normal samples. For tumor datasets without normal samples such as CCLE, users can integrate normal gene expression data into the analysis by selecting the GTEx dataset during the plot configuration step. The Box plot enables us to estimate that MET is over-expressed in ~1% of TCGA gastric cancer samples (Supplementary Figure 8A). To identify cell lines that overexpress MET, we integrated CCLE and GTEx expression data in the same plot and sorted samples by tissue type to obtain side-by-side comparisons of tumor vs. normal samples. Further zooming in showed that MET was expressed at significantly higher levels in gastric cancer cell lines than in normal gastric tissues (Supplementary Figure 8B-C). Cancer cell lines may harbor multiple driver mutations and therefore complicate experimental design and interpretation. To enable model selection based on multi-gene profiles, we have developed an analytical tool called OASIS-print based on the open source version of Oncoprint in cbioportal 31. By combining graphical and tabular visualizations, OASIS-print allows us to visually identify four gastric cancer cell lines with MET amplification and wild-type EGFR and KRAS (Supplementary Figure 9). Differential Expression Analysis with Volcano Plot The Volcano plot provides an interactive visualization of differential expression between tumor and normal samples for a given dataset (Supplementary Methods). The query interface allows users to examine all genes in a dataset or only genes from user-specified oncogenic pathways or gene signatures defined by MSigDB 37. We can search for a particular gene by name and multi-select genes of interest within the Volcano plot. To learn more about a gene of interest, we can use the pop-up menu to retrieve the Gene report or to

5 perform additional analyses such as visualizing the gene expression across individual samples using Box plot or Bar plot. As shown in the Volcano plot, MET was over-expressed in tumor vs. normal tissues in the TCGA Thyroid cancer cohort (Supplementary Figure 10A). ADC Target Analysis Antibody drug conjugates (ADC) such as the anti-her2 Kadcyla is a powerful class of anticancer drugs that target cell surface proteins over-expressed in tumors vs. normal tissues. To enable ADC target gene analysis, OASIS has aggregated RNA-Seq data for 7,347 primary tumors (TCGA), 781 cancer cell lines (CCLE) and 3,231 normal tissues (GTEx). The Volcano plot can be used to identify all transmembrane genes over-expressed in TCGA breast cancer vs. normal tissues including ERBB2 (Supplementary Figure 10B). Box plot analysis showed that ERBB2 was over-expressed in tumor vs. normal in breast, gastric and non-small cell lung cancers (Supplementary Figure 11A). Integrative analysis of TCGA and GTEx expression data confirmed that ERBB2 expression was higher in tumors than all the normal tissues. Moreover, we confirmed that ERBB2 expression was higher in TCGA ovarian tumors than in normal tissues even though ovarian normal tissue data was not available in TCGA (Supplementary Figure 11B). Scatter plot analysis indicated that ERBB2 over-expression was induced by copy number amplifications (Supplementary Figure 11C). Finally, we used OASISprint to identify cancer cell lines that are ERBB2 amplified but ER-negative with wild-type PIK3CA (Supplementary Figure 11D). SUPPLEMENTARY REFERENCES 1. Li, B. & Dewey, C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). 2. McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, (2010). 3. Sherry, S.T. et al. dbsnp: the NCBI database of genetic variation. Nucleic Acids Res 29, (2001). 4. Forbes, S.A. et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Res (2014). 5. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, (2012). 6. Consortium, G.T. The Genotype-Tissue Expression (GTEx) project. Nat Genet 45, (2013). 7. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, (2012). 8. Kan, Z. et al. Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinoma. Genome Res 23, (2013). 9. Wang, K. et al. Genomic landscape of copy number aberrations enables the identification of oncogenic drivers in hepatocellular carcinoma. Hepatology 58, (2013). 10. Wang, K. et al. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer. Nat Genet 46, (2014). 11. Wilks, C. et al. The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data. Database (Oxford) 2014(2014). 12. da Cunha, J.P. et al. Bioinformatics construction of the human cell surfaceome. Proc Natl Acad Sci U S A 106, (2009). 13. Brown, K.J. et al. The human secretome atlas initiative: implications in health and disease conditions. Biochim Biophys Acta 1834, (2013).

6 14. Almen, M.S., Nordstrom, K.J., Fredriksson, R. & Schioth, H.B. Mapping the human membrane proteome: a majority of the human membrane proteins can be classified according to function and evolutionary origin. BMC Biol 7, 50 (2009). 15. Ortutay, C., Siermala, M. & Vihinen, M. Molecular characterization of the immune system: emergence of proteins, processes, and domains. Immunogenetics 59, (2007). 16. Uhlen, M. et al. Towards a knowledge-based Human Protein Atlas. Nat Biotechnol 28, (2010). 17. Mermel, C.H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 12, R41 (2011). 18. Flicek, P. et al. Ensembl Nucleic Acids Res 42, D (2014). 19. Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4, (2009). 20. Finn, R.D. et al. Pfam: the protein families database. Nucleic Acids Res 42, D (2014). 21. Smyth, G.K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, Article3 (2004). 22. Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edger: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, (2010). 23. Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B(1995). 24. Hu, J. Cancer outlier detection based on likelihood ratio test. Bioinformatics 24, (2008). 25. Hopkins, A.L. & Groom, C.R. The druggable genome. Nat Rev Drug Discov 1, (2002). 26. Kasprzyk, A. BioMart: driving a paradigm change in biological data management. Database (Oxford) 2011, bar049 (2011). 27. Zhang, J. et al. International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data. Database (Oxford) 2011, bar026 (2011). 28. Zhu, J. et al. The UCSC Cancer Genomics Browser. Nat Methods 6, (2009). 29. Gundem, G. et al. IntOGen: integration and data mining of multidimensional oncogenomic data. Nat Methods 7, 92-3 (2010). 30. Ching, K.A. et al. Cell Index Database (CELLX): a web tool for cancer precision medicine. Pac Symp Biocomput, 10-9 (2015). 31. Cerami, E. et al. The cbio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2, (2012). 32. Leiserson, M.D. et al. MAGI: visualization and collaborative annotation of genomic aberrations. Nat Methods 12, (2015). 33. Consortium, G.T. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, (2015). 34. Futreal, P.A. et al. A census of human cancer genes. Nat Rev Cancer 4, (2004). 35. Organ, S.L. & Tsao, M.S. An overview of the c-met signaling pathway. Ther Adv Med Oncol 3, S7-S19 (2011). 36. Smolen, G.A. et al. Amplification of MET may identify a subset of cancers with extreme sensitivity to the selective tyrosine kinase inhibitor PHA Proc Natl Acad Sci U S A 103, (2006).

7 37. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, (2005).

8 Supplementary Table 1: Datasets in OASIS 1.0. Alteration type Mutation Copy Number Alterations Expression SOURCE PROJECT NAME TISSUE TYPE Copy Number Copy Number Peak MicroArray Expression RNA-Seq Expression Differential Expression Outlier Expression PFIZER-HKU Gastric cancer Stomach PFIZER-METABRIC Breast cancer Breast PFIZER- ACRG Hepatocellular carcinoma Liver PFIZER-SAMSUNG Hepatocellular carcinoma Liver TCGA Bladder urothelial carcinoma Bladder Breast invasive carcinoma Breast Cervical squamous cell carcinoma and endocervical adenocarcinoma Cervical Colon adenocarcinoma Colon Lymphoid neoplasm diffuse large B-cell lymphoma Diffuse large b-cell Lymphoma Glioblastoma multiforme glioblastoma multiforme Head and Neck squamous cell carcinoma Head and Neck Kidney chromophobe Kidney Kidney renal clear cell carcinoma Kidney Kidney renal papillary cell carcinoma Kidney Acute myeloid leukemia Acute Myeloid Leukemia Brain lower grade glioma Brain lower grade glioma Liver hepatocellular carcinoma Liver Lung adenocarcinoma Lung Lung squamous cell carcinoma Lung Ovarian serous cysta Ovary Pancreatic adenocarcinoma Pancreas Prostate adenocarcinoma Prostate Rectum adenocarcinoma Rectum Sarcoma Sarcoma Skin cutaneous melanoma Skin Stomach adenocarcinoma Stomach Thyroid carcinoma Thyroid Uterine corpus endometroid carcinoma Uterine BROAD Cancer Cell Line Enciclopedia (CCLE) * Cell lines BROAD Genotype Tissue Expression (GTEx) ** Normal Tissue * Mutation calls in CCLE are only made for a targeted list of Genes. ** GTEx data correspond to normal samples only

9 SUPPLEMENTARY PROTOCOL This protocol contains step-by-step instructions for the following use cases: (1) Analysis of CDK4 as a potential drug target; (2) Antibody drug conjugate (ADC) target analysis; (3) Selecting cell lines based on multi-gene omics data; (4) Downloading OASIS-Genomics source code; (5) Programmatic Access API. Use Case 1: Analysis of CDK4 as a potential drug target (1) Pan-cancer report This is a step-by-step guide on how to use OASIS web portal to evaluate CDK4 as a drug target gene based on integrative analysis of cancer omics data. In the first step, we use Pan-cancer report to retrieve the prevalence of alterations affecting a list of genes involved in cell cycle regulation. To access the Pancancer report feature, click on the Analysis option on the navigation menu, and then click on the Pan Cancer Report option in the drop-down menu to open the query interface to the Pan-cancer report. Three input options are available from the interface. The Cancer Types menu provides the list of datasets available for selection grouped by tissue of origin. If no cancer type is selected all of them will be used in the query. To select multiple cancer types, do Ctrl+click to select each cancer type. The Genes text box allows users to type in comma separated gene list. The Oncogenic Pathways menu provides a list of pre-defined pathways implicated in cancer. A user can either specify a custom list of genes or choose an oncogenic pathway or a combination of both. You can select Breast, Colon, Gastric, Glioblastoma, Head and Neck, Liver, Lung, Melanoma, Ovarian, Prostate, Rectal, Renal, Uterine, Models and Cell lines from the Cancer Types menu (A). In the Genes text box (B), type CCND1, CDK4, CDK6, CDKN2A, E2F1, RB1 and click on Search (C). 1

10 The Pan-cancer report returns a table where each row represents a gene and each column represents a single cancer type (top row) and dataset (second row). Alteration types are color coded, with green representing mutations (substitutions and indels), red representing copy number gains and blue representing copy losses. Rows can be sorted by alteration frequency value in each table cell. Click on the red column header for Glioblastoma (TCGA) to sort the rows by the frequency of copy number alterations for the selected dataset (A). Mouse over a cell in the row corresponding to CDK4 to obtain information on the number and percentage of samples amplified in that cohort (B) 13.34% of the TCGA Glioblastoma cohort. From the Gene Info, we can also see that CDK4 is a known oncogene and may be targeted by small molecule drugs. 2

11 (2) Copy number Bar plot The Bar plot can display the copy number values (log2 ratio) for one gene across multiple samples from different datasets. In this step, we use the Bar plot to visually assess copy number amplifications affecting CDK4 in the TCGA Glioblastoma cohort. To access the Bar plot, click on Plots in the Navigation menu and then click on Bar plot (Copy number across samples). In the datasets list select Glioblastoma (TCGA) (A). Once a dataset is selected type CDK4 into the gene symbol text box (B) in the Restrict Search section. Click on GO (C) to create the plot. 3

12 In the plot, each bar corresponds to a single sample colored by copy number status. Mouse over an individual bar to bring a pop-up box containing information about the selected sample. To select multiple samples click on Mode on the top left to switch from the default Zoom mode to the Select mode. 4

13 With Mode: Select enabled, highlight a group of samples by clicking on the plot and drag the mouse pointer to create a rectangle enclosing the samples of interest. In this case we want to select all the samples that harbor CDK4 amplifications (dark red bars). A pop-up box will appear with information about the selected samples including sample name, cancer and tissue type and copy number log2 ratio values. This box will also display the percentage of samples being selected. Here it confirms that ~13% of the Glioblastoma (TCGA) dataset harbor amplifications of CDK4. Selected data can be exported as a excel file by clicking on the button Export to Excel. Alternatively the sample names can be copied onto the clipboard by using the Copy sample names button. 5

14 (3) Scatter plot The Scatter plot may be used to assess whether CDK4 amplification correlates with higher gene expression. To access the Scatter plot click on Plots in the navigation menu and then click on Scatter plot (Copy number vs. gene expression). 6

15 From the Select Datasets section, select Glioblastoma (TCGA) (A) and type CDK4 into the gene symbol text box (B) in the Restrict Search section. From the Expression data type drop-down menu select RNASeq (RSEM) (C). Only RNA-Seq (RSEM) data should be used to combine gene expression data across multiple datasets. Click on GO (D) to create the plot. 7

16 The Scatter plot shows the copy number values (log2 ratio) on the X-axis and the expression values on the Y-axis. For RNA-Seq (RSEM) data, gene expression is in the unit of TPM (Transcripts-Per-Million). In the plot each data point represents a single sample with the straight line representing the linear fit. As shown in the legend, the Pearson correlation coefficient between copy number and expression of CDK4 is 0.78 in the Glioblastoma (TCGA) dataset, suggesting that amplification induces CDK4 over-expression. Mouse over a data point to display a pop-up box containing information about the highlighted sample. Links to the Sample report and other visualizations can be accessed from the pop-up box. 8

17 (4) Gene expression Box plot The Box plot in OASIS can be used to visualize gene-level expression based on either Microarray or RNA- Seq experiments for individual samples grouped by cancer type and status (tumor/normal). This visualization tool allows us to compare CDK4 expression between tumor and normal samples and across cancer types. To access Box plot, one can click on Plots and select Box plot (RNA-Seq Expression across cancer types). 9

18 Select Glioblastoma (TCGA) from the Select Datasets section (A). As the Glioblastoma (TCGA) dataset does not have normal samples available, you can integrate with a normal reference dataset from the GTEx project. Do Ctrl+click on Normal tissue Expression (GTEx) (B) to include this dataset in the analysis. Add CDK4 in the gene symbol text box (C) and click the GO button (D). 10

19 A new Box plot will appear with the combined data for the selected datasets. Click on Sort by name (A) in the options at the top of the plot. This action will sort the X-axis categories alphabetically and place TCGA Glioblastoma samples side-by-side with the GTEx normal brain samples. Click on Switch to Log(2) Scale (B) option to toggle the Y-axis values from linear to log2 scale. 11

20 With the default Mode: Zoom setting shown in the menu options, draw a rectangle around the brain samples. 12

21 The plot will zoom in to show only samples for the cancer types selected. Simply click on Reset zoom at the top right of the plot to return to the default view. This analysis demonstrates that CDK4 is indeed overexpressed in Glioblastoma tumors compared to normal brain tissues. 13

22 (5) Volcano plot The Volcano plot provides a way to visualize differential expression of all genes within a dataset and identify those genes that are up-regulated (red) or down-regulated (blue) in tumor vs. normal tissues. To go to the Volcano plot click on Plots and select Volcano plot (Differential Expression across genes in a project). In the Select Datasets section select Lung Squamous Cell Carcinoma (TCGA) (A). The Volcano plot only supports the analysis of a single dataset at a time. In the Restrict Search section one can select a subset of genes that belong to a specific pathway or gene signature, genes that are known cancer genes (as defined by the Cancer Gene Census) or genes that belong to the Surfaceome, Secretome or Immunome categories. Select genes that are known Oncogenes or Tumor suppressors from the Cancer gene status filter (B) and click the Go button (C). 14

23 In the Volcano plot one can search for a gene of interest using the Gene Search function from the options menu at the top of the plot. 15

24 Click on Gene Search. In the pop-up search box type CDK4 and click on Search. If the gene exists in the selected gene set, it will be highlighted in the plot with a pop-up box showing the gene name and a change of color in the corresponding data point. 16

25 The Volcano plot analysis indicates that CDK4 is differentially expressed in Lung Squamous Cell Carcinoma compared to adjacent normal tissues. (6) OASIS-print OASIS-print enables users to visualize multiple types of alterations affecting a list of genes across all the samples in a single dataset. Here OASIS-print can be used to identify cell line models with a specific profile - CDK4 amplifications and wild-type RB1. To access OASIS-print, click on Analysis in the navigation menu and select OASIS-print. 17

26 In the Select Datasets section click on Cell Lines (CCLE) (A). In the Entries with following IDs text box type CDK4, CDK6, CCND1, RB1 (B) and click on the Go button (C). The OASIS-print results page is divided into two sections. The upper section provides a graphical view of the data where each row represents a gene and each column of dots represents a single sample. Alterations are color-coded and color gradients are used to represent the alteration values with darker colors corresponding to higher values and lighter colors corresponding to lower values. The lower section provides the query results in a tabular view where each row represents a sample and each column represent a gene and alteration type. Values in each table cell are also color coded to highlight altered samples. Both the graphic view and the table view are fully interactive. 18

27 Within the table view samples can be filtered by using the search box (A) to identify a particular sample or to show only those samples that harbor a particular mutation. Samples in the table view can also be sorted by clicking on the small arrow to the left of each column name. The sorting on the table is also applied to the graphical view above. The sample names column (B) also provide links to the Sample report where more information about the sample of interest can be obtained. This information is also accessible from the graphical view by doing a mouse over a sample and following the link in the pop-up box. Mouse over a sample name in the table will also highlight the same sample in the graphic view. 19

28 Using the OASIS-print feature it was possible to identify the Lung adenocarcinoma cell line RERFLCAD2 which harbors CDK4 amplification for target validation and pharmacological studies. OASIS-print has a unique feature allowing users to rearrange the graphic view by sorting the molecular data in the table view. By default OASIS-print shows cell lines sorted by the copy number value of the first gene in the gene list. To sort cell lines by the expression value of CDK4, click on the arrows at the right of the RNAseq expr header for CDK4 (A) in the table view. Cell line HCC827 is identified as the Lung adenocarcinoma cell line with higher CDK4 expression. 20

29 21

30 Use case 2: Antibody drug conjugate (ADC) target analysis (1) Volcano plot This is a step-by-step guide on how to use the OASIS web portal to identify and evaluate ERBB2 as a target gene for the ADC modality. First, we use the Volcano plot to identify significantly up-regulated transmembrane genes in breast cancer tumors compared to normal tissues. Click on Plots to access the Volcano plot. In the drop-down menu click on the Volcano plot (Differential Expression across genes in a project) option to open the query interface to the Volcano plot. Click on Breast Invasive Carcinoma (TCGA) (A). In the Restrict Search section one can choose to show only genes that belong to a specific pathway or gene signature, genes that are known cancer genes (defined by Cancer Gene Census) or genes that belong to the Surfaceome, Secretome or Immunome categories. To show only transmembrane genes select YES from the Is Surfaceome filter (B) and click the Go button (C). 22

31 In the Volcano plot one can search for a gene of interest by using the Gene Search function from the options menu at the top of the plot. Click on Gene Search (A). In the pop-up search box type ERBB2 (B) and click on Search (C). ERBB2 it will be highlighted in the plot with a pop-up box showing the name of the gene and a change of color in the corresponding data point (D). 23

32 You can mouse over the data point representing ERBB2 to obtain a pop-up box with more information. From the pop-up box one can access the Gene report (A), a summary of alterations affecting the gene across all datasets. Links to the copy number Bar plot and to the expression box plots and Bar plots are also available (B). These analyses will provide information at the sample level. A link to the differential expression result data in tabular format is also available (C). 24

33 (2) Gene expression Box plot Follow the Expression (RNASeq) across samples (boxplot) link from the Volcano plot or click in Plots and Box plot (RNA-Seq Expression across cancer types) in the navigation menu to access the Box plot. The Box plot can be used to visualize gene-level expression based on either Microarray or RNA-Seq experiments for individual samples grouped by cancer type and status (tumor/normal). This tool allows us to compare ERBB2 expression between tumor and normal samples and across cancer types. 25

34 Select Breast Invasive Carcinoma (TCGA) from the Select Datasets section (A). Add additional datasets by Ctrl+click on Gastric Adenocarcinoma (TCGA), Lung Adenocarcinoma (TCGA) and Lung Squamous Cell Carcinoma (TCGA). Type ERBB2 into the gene symbol text box (C) and click the Go button (D). A new Box plot will appear with the data for the selected dataset split in tumor (red) and normal (green) samples. Click on Sort by Name (A) option menu at the top of the plot to sort the datasets in alphabetic order. Click on Switch to Log(2) Scale (B) option on the menu at the top of the plot to change the values on the Y-axis from linear to log2 scale. 26

35 The Box plot analysis suggests that ERBB2 is significantly up-regulated in tumor vs. normal in Breast Invasive Carcinoma, Gastric Adenocarcinoma and Lung Adenocarcinoma. Click the blue Back button (A) on the top right of the plot to go back to the Box plot menu and add new datasets. 27

36 Ctrl+click Ovarian Serous Cystadenocarcinoma (TCGA) (A). As there are no normal samples available in this dataset, you can integrate tumor data with the normal reference data from GTEx. Do Ctrl+click on Normal Tissue expression (GTEx) (A). Type ERBB2 into the gene symbol text box (B) and click the Go button (C). 28

37 Click on Sort by Name option menu at the top of the plot to sort the datasets in alphabetic order. The Ovarian tumor samples will be displayed side-by-side with the Ovary tissue samples from GTEx. With the Mode: Zoom option selected (A), draw a rectangle around the ovary samples to zoom into this part of the plot. 29

38 Click on Switch to Log(2) Scale (A) option on the menu at the top of the plot to change the values on the Y-axis from linear to log2 scale. The integrative analysis of Ovarian Serous Cystadenocarcinoma (TCGA) and normal Ovary (GTEx) data demonstrates that ERBB2 is up-regulated in Ovarian tumors than in normal tissue. (3) Scatter plot 30

39 The Scatter plot enables us to visualize the correlation between ERBB2 copy number and expression at the sample level. Using the scatter plot one can identify a subset of cases where ERBB2 over-expression is driven by amplification. To go to the Scatter plots click on Plots in the navigation menu and then click on Scatter plot (Copy number vs. gene expression). From the Select Datasets section click on Breast Invasive Carcinoma (TCGA) (A). Add additional datasets by Ctrl+click on Gastric Adenocarcinoma (TCGA), Lung Adenocarcinoma (TCGA), Lung Squamous Cell Carcinoma (TCGA) and Ovarian Serous Cystadenocarcinoma (TCGA). Once the datasets have been selected type ERBB2 into the gene symbol text box (B) and select RNASeq (RSEM) from the Expression data type drop-down menu (C). Click on Go (D) to produce the plot. 31

40 The Scatter plot shows the copy number values (log2 ratio) on the X-axis and the expression values on the Y-axis. For RNA-Seq (RSEM) data, gene expression is in the unit of TPM (Transcripts-Per-Million). In the plot each data point represents a single sample with the straight line representing the linear fit. As shown in the legend, the Pearson correlation coefficient between copy number and expression of ERBB2 is 0.86 in the Breast Invasive Carcinoma (TCGA) dataset, suggesting that amplification induces CDK4 over-expression. 32

41 (4) OASIS-print OASIS-print enables users to visualize multiple types of alterations affecting a list of genes across all the samples in a single dataset. Here OASIS-print can be used to identify breast cancer cell lines that are HER2 positive and ER negative with wild-type PIK3CA. To access OASIS-print, click on Analysis in the navigation menu and select OASIS-print. In the Select Datasets section click on Cell Lines (CCLE) (A). In the Entries with following IDs text box type ERBB2, ESR1, PIK3CA (B). Filter the cell lines by cancer type by selecting breast in the Cancer type filter (C) and click on the Go button (D). 33

42 The OASIS-print results page is divided into two sections. The upper section provides a graphical summary of the data where the each column of dots represents a single sample and each row represents a gene. Alterations are color-coded and color gradients are used to represent the alteration values with darker colors corresponding to higher values and lighter colors corresponding to lower values. The lower section provides the query results in a tabular format where each row represents a sample and each column represent a gene and alteration type. Values in each table cell are also color coded to highlight altered samples. Both the graphic display and the results table are fully interactive. 34

43 From this analysis we can identify breast cancer cell lines that are HER2 positive and ER negative and wild-type for PIK3CA such as AU565, HCC1569 and SKBR3. 35

44 Do a mouse-over on a single sample to obtain a pop-up box with description about cell line name, copy number and expression values. Click on the cell line name to open a Sample report containing more information on the cell line of interest. 36

45 Click on the Mut. column for PIK3CA to sort cell lines in the table and graphical views by the amino acid mutation value. We can identify two cell lines (EFM192A and JIMT1) harboring the PIK3CA C420R mutation and also ERBB2 amplification and over-expression. 37

46 38

47 Use Case 3: Selecting cell lines based on multi-gene omics data This analysis vignette illustrates how to use the OASIS Treemap feature to access annotated mutations in Lung Adenocarcinoma Cell lines from CCLE. The Data summary consists of two treemaps, the upper one showing primary tumor data classified by cancer types and the lower one showing cell line data classified by projects as the first level of the hierarchy. First, click on the Cancer Cell Lines (CCLE) dataset in the lower Treemap (A) to enter the next level - cancer type. Here CCLE cell line data is classified and color coded by different cancer types. Click on Lung LUAD (A) to obtain information about available data types in lung adenocarcinoma cell lines. 39

48 You can see there are 65 unique Lung adenocarcinoma cell lines available in CCLE, with mutation data reported for 55 cell lines, expression (microarray) data for 62 cell lines, CNV data for 58 cell lines and RNA-Seq expression data for 53 cell lines. To access the mutation alteration report, click on the box that says Mutation: 55 (A). The mutation alteration report provides a table where each row represents a single mutation event with information about sample name, gene name, amino acid change, cancer gene status classification (from the Cancer Gene Census), mutation type and mutation consequence. Click on the Cancer gene status field (A) to re-order rows. 40

49 Mutations affecting oncogenes are now shown at the top of the list. Due to the large data size and performance considerations, OASIS caps the size of the Alteration report table at 1,000 rows. To obtain the full list of mutations, click on the Download data button (A). 41

50 As illustrated by the analysis vignettes, the following analyses and functionalities are only available in OASIS: - Use case 1, Step (1) Pan-cancer report: In a single interface, obtain a panoramic view of multiple genetic alterations affecting multiple genes, then drill into detailed alteration data for one of the genes and learn about important gene characteristics such as small molecule druggability and oncogene status. - Use case 1, Step (2) Copy number Bar plot: After the first analysis creates the plot, users can flexibly select a subset of data points including samples or genes to obtain detailed data and summary statistics. With another click, users can launch additional analyses on any of the selected data point. - Use case 1, Step (5) Volcano plot: Identifying CDK4 as one of the differentially expressed cancer genes in tumor vs. normal tissues in Lung Squamous Cell Carcinoma (TCGA). - Use case 1, Step (6) OASIS-print: Identifying cell lines with highest levels of CDK4 amplification by sorting the cell lines by copy number values while reviewing the genetic alteration statuses of 42

51 other genes such as RB1. Graphic view is rearranged based on sorting of molecular data such as mutation status or expression value in the table view. - Use case 2, Step (2) Gene expression Box plot: Comparing gene expression (e.g. CDK4, ERBB2) derived from RNA-Seq data in tumors vs. adjacent normal samples across multiple cancer types using TCGA data. Integrate expression data from TCGA tumors and GTEx normal tissues when TCGA does not have normal samples for a cancer type such as Glioblastoma or Ovarian cancer. - Use case 2, Step (3) Scatter plot: In the same plot, users can select one or multiple cancer types and visualize correlation patterns by clicking on corresponding categories in the legend. - Use case 3, Treemap: Access and browse any omics data or sample annotation in OASIS using three or fewer clicks from the Home page. Use Case 4: Downloading OASIS-Genomics source code The source code for the OASIS-Genomics web application is available as a git repository from Sourceforge.net. The code is distributed in 3 different folders: BioMart: Contains the code corresponding to the webserver and the main components of the front end. This is the customized code from BioMart (Instructions on how to configure the server and running it can be found at the BioMart website ( Database: SQL code and schema design necessary to implement the OASIS-genomics database/warehouse back end. The current version of the code is designed to run on Oracle 11. Oasiswidgets: Code for some of the custom built functionalities in the OASIS-genomics web portal, including code for OASIS-print, Pan-cancer report, and mutation summary. To access the code repository click on the following link or copy and paste it to your favorite web browser: From the main page, click on the Browse Code button: 43

52 Once on the Git repository page, select the type of download/protocol you want to use from the top left options (e.g HTTP). Once the protocol has been selected, copy the text in the access text box on the top right. If working with windows, you can paste the selected code to your favorite git tool. Shown in the example is TortoiseGit ( 44

53 From linux/unix you can copy the text onto the terminal and download the code into the selected folder: Use Case 5: Programmatic Access API OASIS provides direct programmatic access to all the data stored within the database through the builtin BioMart API. The BioMart API Consists of four parts: REST API, SOAP API, SPARQL API and Java API. All four APIs have access to the same methods, so you can choose the one you are most comfortable with. More information on the BioMart API and how to use it can be found in the BioMart documentation here: (1) Use of Web service to retrieve the data used to build the following visualization: ene_ensembl_oasis18brcatcga%2chsapiens_gene_ensembl_oasis20coadtcga%2chsapiens_gene_ensem bl_oasis40stadtcga%2chsapiens_gene_ensembl_oasis30lihctcga%2chsapiens_gene_ensembl_oasis31lu adtcga%2chsapiens_gene_ensembl_oasis32lusctcga%2chsapiens_gene_ensembl_oasis33ovtcga&hsapi ens_gene_ensembl int_cnv_exp dm gene_symbol=met&hsapiens_gene_ensembl int_cnv_exp dm expression_type=rsem&preview=true 45

54 (2) REST Access example: Copy and paste the following code in a text file: <!DOCTYPE Query><Query client="webbrowser" processor="tsv" limit="-1" header="1"><dataset name="hsapiens_gene_ensembl_oasis18brcatcga,hsapiens_gene_ensembl_oasis20coadtcga,hsapiens_g ene_ensembl_oasis40stadtcga,hsapiens_gene_ensembl_oasis30lihctcga,hsapiens_gene_ensembl_oasis 31luadtcga,hsapiens_gene_ensembl_oasis32lusctcga,hsapiens_gene_ensembl_oasis33ovtcga" config="gene_ensembl_config_3_1_1"><filter name="hsapiens_gene_ensembl int_cnv_exp dm gene_symbol" value="met"/><filter name="hsapiens_gene_ensembl int_cnv_exp dm expression_type" value="rsem"/><attribute name="hsapiens_gene_ensembl int_cnv_exp dm gene_symbol"/><attribute name="hsapiens_gene_ensembl int_cnv_exp dm log_ratio"/><attribute name="hsapiens_gene_ensembl int_cnv_exp dm normalized_expression_level"/><attribute name="hsapiens_gene_ensembl int_cnv_exp dm specimen_id"/><attribute name="hsapiens_gene_ensembl int_cnv_exp dm cancer_type"/><attribute name="hsapiens_gene_ensembl int_cnv_exp dm specimen_origin"/><attribute name="hsapiens_gene_ensembl int_cnv_exp dm exp_outlier_status"/><attribute name="cancertype"/><attribute name="hsapiens_gene_ensembl int_cnv_exp dm expression_type"/><attribute name="hsapiens_gene_ensembl int_cnv_exp dm copy_number"/><attribute name="sampledataset"/></dataset></query> In a terminal window run the following command where example_file.txt is the file name used to save the code above. % curl --data-urlencode query@example_file.xml 46

55 Running the code will print the results to the terminal. To save the output data to a file run the command below: % curl --data-urlencode query@example_file.xml > output_file.txt 47

Cancer Informatics Lecture

Cancer Informatics Lecture Cancer Informatics Lecture Mayo-UIUC Computational Genomics Course June 22, 2018 Krishna Rani Kalari Ph.D. Associate Professor 2017 MFMER 3702274-1 Outline The Cancer Genome Atlas (TCGA) Genomic Data Commons

More information

Session 4 Rebecca Poulos

Session 4 Rebecca Poulos The Cancer Genome Atlas (TCGA) & International Cancer Genome Consortium (ICGC) Session 4 Rebecca Poulos Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 20

More information

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser Melissa S. Cline 1*, Brian Craft 1, Teresa Swatloski 1, Mary Goldman 1, Singer Ma 1, David Haussler 1, Jingchun Zhu 1 1 Center for Biomolecular

More information

Session 4 Rebecca Poulos

Session 4 Rebecca Poulos The Cancer Genome Atlas (TCGA) & International Cancer Genome Consortium (ICGC) Session 4 Rebecca Poulos Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 28

More information

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies 2017 Contents Datasets... 2 Protein-protein interaction dataset... 2 Set of known PPIs... 3 Domain-domain interactions...

More information

Module 3: Pathway and Drug Development

Module 3: Pathway and Drug Development Module 3: Pathway and Drug Development Table of Contents 1.1 Getting Started... 6 1.2 Identifying a Dasatinib sensitive cancer signature... 7 1.2.1 Identifying and validating a Dasatinib Signature... 7

More information

The Cancer Genome Atlas & International Cancer Genome Consortium

The Cancer Genome Atlas & International Cancer Genome Consortium The Cancer Genome Atlas & International Cancer Genome Consortium Session 3 Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 31 st July 2014 1

More information

Data mining with Ensembl Biomart. Stéphanie Le Gras

Data mining with Ensembl Biomart. Stéphanie Le Gras Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:

More information

Original Article International Cancer Genome Consortium Data Portal a one-stop shop for cancer genomics data

Original Article International Cancer Genome Consortium Data Portal a one-stop shop for cancer genomics data Original Article International Cancer Genome Consortium Data Portal a one-stop shop for cancer genomics data Junjun Zhang 1, Joachim Baran 1, A. Cros 1, Jonathan M. Guberman 1, Syed Haider 2, Jack Hsu

More information

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types Cell Reports, Volume 23 Supplemental Information Integrated Genomic Analysis of the Ubiquitin Pathway across Zhongqi Ge, Jake S. Leighton, Yumeng Wang, Xinxin Peng, Zhongyuan Chen, Hu Chen, Yutong Sun,

More information

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from Supplementary Figure 1 SEER data for male and female cancer incidence from 1975 2013. (a,b) Incidence rates of oral cavity and pharynx cancer (a) and leukemia (b) are plotted, grouped by males (blue),

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Pan-cancer analysis of global and local DNA methylation variation a) Variations in global DNA methylation are shown as measured by averaging the genome-wide

More information

Nature Getetics: doi: /ng.3471

Nature Getetics: doi: /ng.3471 Supplementary Figure 1 Summary of exome sequencing data. ( a ) Exome tumor normal sample sizes for bladder cancer (BLCA), breast cancer (BRCA), carcinoid (CARC), chronic lymphocytic leukemia (CLLX), colorectal

More information

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley Department of Genetics Lineberger Comprehensive Cancer Center The University of North Carolina at Chapel Hill What is TCGA? The Cancer Genome

More information

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA LEO TUNKLE *

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA   LEO TUNKLE * CERNA SEARCH METHOD IDENTIFIED A MET-ACTIVATED SUBGROUP AMONG EGFR DNA AMPLIFIED LUNG ADENOCARCINOMA PATIENTS HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA Email:

More information

The Cancer Genome Atlas

The Cancer Genome Atlas The Cancer Genome Atlas July 14, 2011 Kenna M. Shaw, Ph.D. Deputy Director The Cancer Genome Atlas Program TCGA: Core Objectives Launched in 2006 as a pilot and expanded in 2009, the goals of TCGA are

More information

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC 2 The term protein domain

More information

Integrated Analysis of Copy Number and Gene Expression

Integrated Analysis of Copy Number and Gene Expression Integrated Analysis of Copy Number and Gene Expression Nexus Copy Number provides user-friendly interface and functionalities to integrate copy number analysis with gene expression results for the purpose

More information

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer Genetic alterations of histone lysine methyltransferases and their significance in breast cancer Supplementary Materials and Methods Phylogenetic tree of the HMT superfamily The phylogeny outlined in the

More information

Analysis with SureCall 2.1

Analysis with SureCall 2.1 Analysis with SureCall 2.1 Danielle Fletcher Field Application Scientist July 2014 1 Stages of NGS Analysis Primary analysis, base calling Control Software FASTQ file reads + quality 2 Stages of NGS Analysis

More information

User Guide. Association analysis. Input

User Guide. Association analysis. Input User Guide TFEA.ChIP is a tool to estimate transcription factor enrichment in a set of differentially expressed genes using data from ChIP-Seq experiments performed in different tissues and conditions.

More information

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures : RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures March 15, 2013 CLC bio Finlandsgade 10-12 8200 Aarhus N Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com support@clcbio.com

More information

Supplementary Tables. Supplementary Figures

Supplementary Tables. Supplementary Figures Supplementary Files for Zehir, Benayed et al. Mutational Landscape of Metastatic Cancer Revealed from Prospective Clinical Sequencing of 10,000 Patients Supplementary Tables Supplementary Table 1: Sample

More information

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers

Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers Machine-Learning on Prediction of Inherited Genomic Susceptibility for 20 Major Cancers Sung-Hou Kim University of California Berkeley, CA Global Bio Conference 2017 MFDS, Seoul, Korea June 28, 2017 Cancer

More information

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, Supplementary Information Supplementary Figures Supplementary Figure 1. a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, gene ID and specifities are provided. Those highlighted

More information

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer

Cancer Gene Panels. Dr. Andreas Scherer. Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer Cancer Gene Panels Dr. Andreas Scherer Dr. Andreas Scherer President and CEO Golden Helix, Inc. scherer@goldenhelix.com Twitter: andreasscherer About Golden Helix - Founded in 1998 - Main outside investor:

More information

TCGA. The Cancer Genome Atlas

TCGA. The Cancer Genome Atlas TCGA The Cancer Genome Atlas TCGA: History and Goal History: Started in 2005 by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) with $110 Million to catalogue

More information

MethylMix An R package for identifying DNA methylation driven genes

MethylMix An R package for identifying DNA methylation driven genes MethylMix An R package for identifying DNA methylation driven genes Olivier Gevaert May 3, 2016 Stanford Center for Biomedical Informatics Department of Medicine 1265 Welch Road Stanford CA, 94305-5479

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

Supplementary Figure 1

Supplementary Figure 1 Supplementary Figure 1 An example of the gene-term-disease network automatically generated by Phenolyzer web server for 'autism'. The largest word represents the user s input term, Autism. The pink round

More information

User Instruction Guide

User Instruction Guide User Instruction Guide Table of Contents Logging In and Logging Out of MMSx 1 Creating a TPN (Terminal Profile Number) 2 Single Merchant 2 From Navigation Bar 2 From Home Page Link 4 Multiple Merchants

More information

IMPaLA tutorial.

IMPaLA tutorial. IMPaLA tutorial http://impala.molgen.mpg.de/ 1. Introduction IMPaLA is a web tool, developed for integrated pathway analysis of metabolomics data alongside gene expression or protein abundance data. It

More information

R2 Training Courses. Release The R2 support team

R2 Training Courses. Release The R2 support team R2 Training Courses Release 2.0.2 The R2 support team Nov 08, 2018 Students Course 1 Student Course: Investigating Intra-tumor Heterogeneity 3 1.1 Introduction.............................................

More information

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity A Consistently unmethylated sites (30%) in 21 cancer types 174,696

More information

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. Supplementary Figure 1 Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. (a,b) Values of coefficients associated with genomic features, separately

More information

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute Whole Genome and Transcriptome Analysis of Anaplastic Meningioma Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute Outline Anaplastic meningioma compared to other cancers Whole genomes

More information

OneTouch Reveal Web Application. User Manual for Healthcare Professionals Instructions for Use

OneTouch Reveal Web Application. User Manual for Healthcare Professionals Instructions for Use OneTouch Reveal Web Application User Manual for Healthcare Professionals Instructions for Use Contents 2 Contents Chapter 1: Introduction...4 Product Overview...4 Intended Use...4 System Requirements...

More information

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits Next-generation performance in liquid biopsies 2 Accelerating clinical research From liquid biopsy to next-generation

More information

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis Jian Xu, Ph.D. Children s Research Institute, UTSW Introduction Outline Overview of genomic and next-gen sequencing technologies

More information

Supplementary Methods

Supplementary Methods Supplementary Methods Short Read Preprocessing Reads are preprocessed differently according to how they will be used: detection of the variant in the tumor, discovery of an artifact in the normal or for

More information

User s Manual Version 1.0

User s Manual Version 1.0 User s Manual Version 1.0 #639 Longmian Avenue, Jiangning District, Nanjing,211198,P.R.China. http://tcoa.cpu.edu.cn/ Contact us at xiaosheng.wang@cpu.edu.cn for technical issue and questions Catalogue

More information

Pathway Exercises Metabolism and Pathways

Pathway Exercises Metabolism and Pathways 1. Find the metabolic pathway for glycolysis. For this exercise use PlasmoDB.org Pathway Exercises Metabolism and Pathways a. Navigate to the search page for Identify Metabolic Pathways based on Pathway

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Length of mrna transcripts in the human genome 5,000 5,000 4,000 3,000 2,000 4,000 1,000 0 0 200 400 600 800 3,000 2,000 1,000 0 0 2,000 4,000 6,000 8,000 10,000 Length

More information

COSMIC - Catalogue of Somatic Mutations in Cancer

COSMIC - Catalogue of Somatic Mutations in Cancer COSMIC - Catalogue of Somatic Mutations in Cancer http://cancer.sanger.ac.uk/cosmic https://academic.oup.com/nar/articl e-lookup/doi/10.1093/nar/gkw1121 Data In Large-scale systematic screens Detailed

More information

Golden Helix s End-to-End Solution for Clinical Labs

Golden Helix s End-to-End Solution for Clinical Labs Golden Helix s End-to-End Solution for Clinical Labs Steven Hystad - Field Application Scientist Nathan Fortier Senior Software Engineer 20 most promising Biotech Technology Providers Top 10 Analytics

More information

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by

More information

Data Management, Data Management PLUS User Guide

Data Management, Data Management PLUS User Guide Data Management, Data Management PLUS User Guide Table of Contents Introduction 3 SHOEBOX Data Management and Data Management PLUS (DM+) for Individual Users 4 Portal Login 4 Working With Your Data 5 Manually

More information

Lionbridge Connector for Hybris. User Guide

Lionbridge Connector for Hybris. User Guide Lionbridge Connector for Hybris User Guide Version 2.1.0 November 24, 2017 Copyright Copyright 2017 Lionbridge Technologies, Inc. All rights reserved. Published in the USA. March, 2016. Lionbridge and

More information

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation

NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation NGS in Cancer Pathology After the Microscope: From Nucleic Acid to Interpretation Michael R. Rossi, PhD, FACMG Assistant Professor Division of Cancer Biology, Department of Radiation Oncology Department

More information

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies. 2014. Supplemental Digital Content 1. Appendix 1. External data-sets used for associating microrna expression with lung squamous cell

More information

Broad GDAC. Lung Adenocarcinoma AWG Run 2013_02_07. Dan DiCara Hailei Zhang Michael Noble

Broad GDAC. Lung Adenocarcinoma AWG Run 2013_02_07. Dan DiCara Hailei Zhang Michael Noble Broad GDAC Lung Adenocarcinoma AWG Run 2013_02_07 Dan DiCara Hailei Zhang Michael Noble Copyright 2013 Broad Institute. All rights reserved. gdac@broadinstitute.org http://gdac.broadinstitute.org GDAC

More information

Digitizing the Proteomes From Big Tissue Biobanks

Digitizing the Proteomes From Big Tissue Biobanks Digitizing the Proteomes From Big Tissue Biobanks Analyzing 24 Proteomes Per Day by Microflow SWATH Acquisition and Spectronaut Pulsar Analysis Jan Muntel 1, Nick Morrice 2, Roland M. Bruderer 1, Lukas

More information

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed. Supplementary Note The potential association and implications of HBV integration at known and putative cancer genes of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed. Human telomerase

More information

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded from TCGA and differentially expressed metabolic genes

More information

Precision Medicine Knowledgebase (PMKB)

Precision Medicine Knowledgebase (PMKB) Precision Medicine Knowledgebase (PMKB) https://pmkb.weill.cornell.edu/ https://academic.oup.com/jamia/article/24/ 3/513/2418181/The-cancer-precisionmedicine-knowledge-base-for Click on this to expand/collapse

More information

SubLasso:a feature selection and classification R package with a. fixed feature subset

SubLasso:a feature selection and classification R package with a. fixed feature subset SubLasso:a feature selection and classification R package with a fixed feature subset Youxi Luo,3,*, Qinghan Meng,2,*, Ruiquan Ge,2, Guoqin Mai, Jikui Liu, Fengfeng Zhou,#. Shenzhen Institutes of Advanced

More information

SUPPLEMENTARY APPENDIX

SUPPLEMENTARY APPENDIX SUPPLEMENTARY APPENDIX 1) Supplemental Figure 1. Histopathologic Characteristics of the Tumors in the Discovery Cohort 2) Supplemental Figure 2. Incorporation of Normal Epidermal Melanocytic Signature

More information

Contents. 1.5 GOPredict is robust to changes in study sets... 5

Contents. 1.5 GOPredict is robust to changes in study sets... 5 Supplementary documentation for Data integration to prioritize drugs using genomics and curated data Riku Louhimo, Marko Laakso, Denis Belitskin, Juha Klefström, Rainer Lehtonen and Sampsa Hautaniemi Faculty

More information

A Versatile Algorithm for Finding Patterns in Large Cancer Cell Line Data Sets

A Versatile Algorithm for Finding Patterns in Large Cancer Cell Line Data Sets A Versatile Algorithm for Finding Patterns in Large Cancer Cell Line Data Sets James Jusuf, Phillips Academy Andover May 21, 2017 MIT PRIMES The Broad Institute of MIT and Harvard Introduction A quest

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

Introduction to LOH and Allele Specific Copy Number User Forum

Introduction to LOH and Allele Specific Copy Number User Forum Introduction to LOH and Allele Specific Copy Number User Forum Jonathan Gerstenhaber Introduction to LOH and ASCN User Forum Contents 1. Loss of heterozygosity Analysis procedure Types of baselines 2.

More information

Supplementary Figure 1: Classification scheme for non-synonymous and nonsense germline MC1R variants. The common variants with previously established

Supplementary Figure 1: Classification scheme for non-synonymous and nonsense germline MC1R variants. The common variants with previously established Supplementary Figure 1: Classification scheme for nonsynonymous and nonsense germline MC1R variants. The common variants with previously established classifications 1 3 are shown. The effect of novel missense

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma. Supplementary Figure 1 Mutational signatures in BCC compared to melanoma. (a) The effect of transcription-coupled repair as a function of gene expression in BCC. Tumor type specific gene expression levels

More information

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits Accelerating clinical research Next-generation sequencing (NGS) has the ability to interrogate many different genes and detect

More information

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles Ying-Wooi Wan 1,2,4, Claire M. Mach 2,3, Genevera I. Allen 1,7,8, Matthew L. Anderson 2,4,5 *, Zhandong Liu 1,5,6,7 * 1 Departments of Pediatrics

More information

BlueBayCT - Warfarin User Guide

BlueBayCT - Warfarin User Guide BlueBayCT - Warfarin User Guide December 2012 Help Desk 0845 5211241 Contents Getting Started... 1 Before you start... 1 About this guide... 1 Conventions... 1 Notes... 1 Warfarin Management... 2 New INR/Warfarin

More information

Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses. Objective

Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses. Objective Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses Objective Upon completion of this exercise, you will be able to use the Influenza Research Database (IRD; http://www.fludb.org/) to: Search

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1. Western blotting with ERβ antibodies Full blots corresponding to Fig. 2, along with replicated experiments at different time points, different batches,

More information

Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They?

Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They? Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They? Dubovenko Alexey Discovery Product Manager Sonia Novikova Solution Scientist September 2018 2 Non-Small Cell Lung Cancer

More information

Clay Tablet Connector for hybris. User Guide. Version 1.5.0

Clay Tablet Connector for hybris. User Guide. Version 1.5.0 Clay Tablet Connector for hybris User Guide Version 1.5.0 August 4, 2016 Copyright Copyright 2005-2016 Clay Tablet Technologies Inc. All rights reserved. All rights reserved. This document and its content

More information

Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs.

Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs. Supplementary Figure 1: Comparison of acgh-based and expression-based CNA analysis of tumors from breast cancer GEMMs. (a) CNA analysis of expression microarray data obtained from 15 tumors in the SV40Tag

More information

Supplementary Figure 1. Screenshot of the MAGI home page and query interface.

Supplementary Figure 1. Screenshot of the MAGI home page and query interface. Supplementary Figure 1 Screenshot of the MAGI home page and query interface. (i) Users select a combination of (public or private) datasets to query. (ii) Users can enter up to 25 genes to query at once.

More information

Multi-omics data integration colon cancer using proteogenomics approach

Multi-omics data integration colon cancer using proteogenomics approach Dept. of Medical Oncology Multi-omics data integration colon cancer using proteogenomics approach DTL Focus meeting, 29 August 2016 Thang Pham OncoProteomics Laboratory, Dept. of Medical Oncology VU University

More information

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,

More information

Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt, Weizhou Zhang 1-4

Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt, Weizhou Zhang 1-4 SOFTWARE TOOL ARTICLE TRGAted: A web tool for survival analysis using protein data in the Cancer Genome Atlas. [version 1; referees: 1 approved] Nicholas Borcherding, Nicholas L. Bormann, Andrew Voigt,

More information

To begin using the Nutrients feature, visibility of the Modules must be turned on by a MICROS Account Manager.

To begin using the Nutrients feature, visibility of the Modules must be turned on by a MICROS Account Manager. Nutrients A feature has been introduced that will manage Nutrient information for Items and Recipes in myinventory. This feature will benefit Organizations that are required to disclose Nutritional information

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Download slides and package http://odin.mdacc.tmc.edu/~rverhaak/package.zip http://odin.mdacc.tmc.edu/~rverhaak/rna-seqlecture.zip Overview Introduction into the topic

More information

CS 6824: Tissue-Based Map of the Human Proteome

CS 6824: Tissue-Based Map of the Human Proteome CS 6824: Tissue-Based Map of the Human Proteome T. M. Murali November 17, 2016 Human Protein Atlas Measure protein and gene expression using tissue microarrays and deep sequencing, respectively. Alternative

More information

The North Carolina Health Data Explorer

The North Carolina Health Data Explorer The North Carolina Health Data Explorer The Health Data Explorer provides access to health data for North Carolina counties in an interactive, user-friendly atlas of maps, tables, and charts. It allows

More information

WFS. User Guide. thinkwhere Glendevon House Castle Business Park Stirling FK9 4TZ Tel +44 (0) Fax +44 (0)

WFS. User Guide. thinkwhere Glendevon House Castle Business Park Stirling FK9 4TZ   Tel +44 (0) Fax +44 (0) WFS User Guide thinkwhere Glendevon House Castle Business Park Stirling FK9 4TZ www.thinkwhere.com Tel +44 (0)1786 476060 Fax +44 (0)1786 47609 Table of Contents WHAT IS A WEB FEATURE SERVICE?... 3 Key

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

PedCath IMPACT User s Guide

PedCath IMPACT User s Guide PedCath IMPACT User s Guide Contents Overview... 3 IMPACT Overview... 3 PedCath IMPACT Registry Module... 3 More on Work Flow... 4 Case Complete Checkoff... 4 PedCath Cath Report/IMPACT Shared Data...

More information

Package xseq. R topics documented: September 11, 2015

Package xseq. R topics documented: September 11, 2015 Package xseq September 11, 2015 Title Assessing Functional Impact on Gene Expression of Mutations in Cancer Version 0.2.1 Date 2015-08-25 Author Jiarui Ding, Sohrab Shah Maintainer Jiarui Ding

More information

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival* LAB ASSIGNMENT 4 1 INFERENCES FOR NUMERICAL DATA In this lab assignment, you will analyze the data from a study to compare survival times of patients of both genders with different primary cancers. First,

More information

Enterprise Interest Thermo Fisher Scientific / Employee

Enterprise Interest Thermo Fisher Scientific / Employee Enterprise Interest Thermo Fisher Scientific / Employee A next-generation sequencing assay to estimate tumor mutation load from FFPE research samples Fiona Hyland. Director of R&D, Bioinformatics Clinical

More information

Cancer troublemakers: a tale of usual suspects and novel villains

Cancer troublemakers: a tale of usual suspects and novel villains Cancer troublemakers: a tale of usual suspects and novel villains Abel González-Pérez and Núria López-Bigas Biomedical Genomics Group Lab web: http://bg.upf.edu Driver genes/mutations: the troublemakers

More information

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R Frequency(%) 1 a b ALK FS-indel ALK R1Q HRAS Q61R HRAS G13R IDH R17K IDH R14Q MET exon14 SS-indel KIT D8Y KIT L76P KIT exon11 NFS-indel SMAD4 R361 IDH1 R13 CTNNB1 S37 CTNNB1 S4 AKT1 E17K ERBB D769H ERBB

More information

CIViC Clinical Interpretations of Variants in Cancer. rnal/v49/n2/full/ng.3774.html

CIViC Clinical Interpretations of Variants in Cancer.     rnal/v49/n2/full/ng.3774.html CIViC Clinical Interpretations of Variants in Cancer www.civicdb.org http://www.nature.com/ng/jou rnal/v49/n2/full/ng.3774.html CIViC Homepage Aim Statement Curation Stats Recent Activity Links at Bottom

More information

Supplementary Information

Supplementary Information Supplementary Information Guided Visual Exploration of Genomic Stratifications in Cancer Marc Streit 1,6, Alexander Lex 2,6, Samuel Gratzl¹, Christian Partl³, Dieter Schmalstieg³, Hanspeter Pfister², Peter

More information

User Guide. Protein Clpper. Statistical scoring of protease cleavage sites. 1. Introduction Protein Clpper Analysis Procedure...

User Guide. Protein Clpper. Statistical scoring of protease cleavage sites. 1. Introduction Protein Clpper Analysis Procedure... User Guide Protein Clpper Statistical scoring of protease cleavage sites Content 1. Introduction... 2 2. Protein Clpper Analysis Procedure... 3 3. Input and Output Files... 9 4. Contact Information...

More information

University of Pittsburgh Cancer Institute UPMC CancerCenter. Uma Chandran, MSIS, PhD /21/13

University of Pittsburgh Cancer Institute UPMC CancerCenter. Uma Chandran, MSIS, PhD /21/13 University of Pittsburgh Cancer Institute UPMC CancerCenter Uma Chandran, MSIS, PhD chandran@pitt.edu 412-648-9326 2/21/13 University of Pittsburgh Cancer Institute Founded in 1985 Director Nancy Davidson,

More information

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis Tieliu Shi tlshi@bio.ecnu.edu.cn The Center for bioinformatics

More information

GLOOKO REPORT REFERENCE GUIDE

GLOOKO REPORT REFERENCE GUIDE GLOOKO REPORT REFERENCE GUIDE November 2018 Version IFU-0010 02 Contents Intended Use... 2 Warnings... 2 Introduction... 3 Reports... 4 Report Criteria...4 Date Range... 4 Glucose Data Source... 4 Exercise

More information

APPLICATION NOTE. Highly reproducible and Comprehensive Proteome Profiling of Formalin-Fixed Paraffin-Embedded (FFPE) Tissues Slices

APPLICATION NOTE. Highly reproducible and Comprehensive Proteome Profiling of Formalin-Fixed Paraffin-Embedded (FFPE) Tissues Slices APPLICATION NOTE Highly reproducible and Comprehensive Proteome Profiling of Formalin-Fixed Paraffin-Embedded (FFPE) Tissues Slices INTRODUCTION Preservation of tissue biopsies is a critical step to This

More information

A Quick-Start Guide for rseqdiff

A Quick-Start Guide for rseqdiff A Quick-Start Guide for rseqdiff Yang Shi (email: shyboy@umich.edu) and Hui Jiang (email: jianghui@umich.edu) 09/05/2013 Introduction rseqdiff is an R package that can detect differential gene and isoform

More information

Development of a NGS Cancer Research Database CancerBase

Development of a NGS Cancer Research Database CancerBase Development of a NGS Cancer Research Database CancerBase Quashiya M. Soudagar 1, Akshatha Prasanna 2, V. G. Shanmuga Priya 3 1 M.Tech, Bioinformatics, KLE Dr. M.S Sheshgiri College of Engineering and Technology,

More information

DNA-seq Bioinformatics Analysis: Copy Number Variation

DNA-seq Bioinformatics Analysis: Copy Number Variation DNA-seq Bioinformatics Analysis: Copy Number Variation Elodie Girard elodie.girard@curie.fr U900 institut Curie, INSERM, Mines ParisTech, PSL Research University Paris, France NGS Applications 5C HiC DNA-seq

More information