OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Similar documents
Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Cancer Informatics Lecture

Journal: Nature Methods

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Nature Neuroscience: doi: /nn Supplementary Figure 1

Session 4 Rebecca Poulos

Module 3: Pathway and Drug Development

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies

Nature Methods: doi: /nmeth.3115

GST: Step by step Build Diary page

Data mining with Ensembl Biomart. Stéphanie Le Gras

Hands-On Ten The BRCA1 Gene and Protein

DeSigN: connecting gene expression with therapeutics for drug repurposing and development. Bernard lee GIW 2016, Shanghai 8 October 2016

User Guide. Association analysis. Input

Plasma-Seq conducted with blood from male individuals without cancer.

Session 4 Rebecca Poulos

Supplementary Figures

SUPPLEMENTARY METHODS

Supplementary Figure 1. Copy Number Alterations TP53 Mutation Type. C-class TP53 WT. TP53 mut. Nature Genetics: doi: /ng.

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Broad GDAC. Lung Adenocarcinoma AWG Run 2013_02_07. Dan DiCara Hailei Zhang Michael Noble

COSMIC - Catalogue of Somatic Mutations in Cancer

Whole Genome and Transcriptome Analysis of Anaplastic Meningioma. Patrick Tarpey Cancer Genome Project Wellcome Trust Sanger Institute

Integrated Analysis of Copy Number and Gene Expression

Original Article International Cancer Genome Consortium Data Portal a one-stop shop for cancer genomics data

Identification of Tissue Independent Cancer Driver Genes

Statistical analysis of RIM data (retroviral insertional mutagenesis) Bioinformatics and Statistics The Netherlands Cancer Institute Amsterdam

Transcriptional Profiles from Paired Normal Samples Offer Complementary Information on Cancer Patient Survival -- Evidence from TCGA Pan-Cancer Data

User s Manual Version 1.0

A homologous mapping method for threedimensional reconstruction of protein networks reveals disease-associated mutations

Next generation diagnostics Bringing high-throughput sequencing into clinical application

Identifying Novel Targets for Non-Small Cell Lung Cancer Just How Novel Are They?

User Guide. Protein Clpper. Statistical scoring of protease cleavage sites. 1. Introduction Protein Clpper Analysis Procedure...

Comparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes

Expanded View Figures

VL Network Analysis ( ) SS2016 Week 3

Expanded View Figures

Clinical Grade Genomic Profiling: The Time Has Come

Characterisation of structural variation in breast. cancer genomes using paired-end sequencing on. the Illumina Genome Analyser

Supplementary Figure 1

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

IMPaLA tutorial.

Supplementary Figure 1

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

Presenter: Lee Sael Collaborative work with POSTECH DM Lab. (Hwanjo Yu & Sungchul Kim) ORTHOGONAL NMF-BASED TOP-K PATIENT MUTATION PROFILE SEARCHING

Supplementary Figure 1. Screenshot of the MAGI home page and query interface.

Nature Immunology: doi: /ni Supplementary Figure 1. RNA-Seq analysis of CD8 + TILs and N-TILs.

R2 Training Courses. Release The R2 support team

SUPPLEMENTARY FIGURES

CIViC Clinical Interpretations of Variants in Cancer. rnal/v49/n2/full/ng.3774.html

Clustered mutations of oncogenes and tumor suppressors.

Bioinformatics Laboratory Exercise

Abstract. Patricia G. Melloy*

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Package bionetdata. R topics documented: February 19, Type Package Title Biological and chemical data networks Version 1.0.

Supplementary Information

I. Setup. - Note that: autohgpec_v1.0 can work on Windows, Ubuntu and Mac OS.

PRECISION INSIGHTS. GPS Cancer. Molecular Insights You Can Rely On. Tumor-normal sequencing of DNA + RNA expression.

Personalized Medicine: Lung Biopsy and Tumor

APPLICATION NOTE. Highly reproducible and Comprehensive Proteome Profiling of Formalin-Fixed Paraffin-Embedded (FFPE) Tissues Slices

Corporate Medical Policy

Supplemental Figure legends

TCGA. The Cancer Genome Atlas

Digitizing the Proteomes From Big Tissue Biobanks

Guide to Use of SimulConsult s Phenome Software

Fully Automated IFA Processor LIS User Manual

LUNG CANCER. pathology & molecular biology. Izidor Kern University Clinic Golnik, Slovenia

Translational Bioinformatics: Connecting Genes with Drugs

Title From User Usage to Subject Analysis A Case Study on the Oncogene

SAPLING: A Tool for Gene Network Analysis focusing on Psychiatric Genetics

Walkthrough

Nature Getetics: doi: /ng.3471

2015 EUROPEAN CANCER CONGRESS

The Cancer Genome Atlas & International Cancer Genome Consortium

EXAMPLE. - Potentially responsive to PI3K/mTOR and MEK combination therapy or mtor/mek and PKC combination therapy. ratio (%)

Cleaning Up and Visualizing My Workout Data With JMP Shannon Conners, PhD JMP, SAS Abstract

Lung Cancer Genetics: Common Mutations and How to Treat Them David J. Kwiatkowski, MD, PhD. Mount Carrigain 2/4/17

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities

1. Go to and click on the GAPMINDER WORLD tab. What is the first thing you notice? (There are no wrong answers.)

IPA Advanced Training Course

NCCN Non-Small Cell Lung Cancer V Meeting June 15, 2018

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Workshop Cum Training on Biological Data Analysis Through Computational. Inside.. About Us

Disclosures Genomic testing in lung cancer

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA LEO TUNKLE *

Predicting disease associations via biological network analysis

CURRICULUM VITA OF Xiaowen Chen

Molecular Testing in Lung Cancer

Module 2: Target Discovery and Validation

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Nature Genetics: doi: /ng Supplementary Figure 1. HOX fusions enhance self-renewal capacity.

Supplementary Figure 1: High-throughput profiling of survival after exposure to - radiation. (a) Cells were plated in at least 7 wells in a 384-well

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

Supplementary Figure 1

Workshop on Analysis and prediction of contacts in proteins

SEQUENCE FEATURE VARIANT TYPES

Automatic Lung Cancer Detection Using Volumetric CT Imaging Features

Analysis with SureCall 2.1

Transcription:

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies 2017

Contents Datasets... 2 Protein-protein interaction dataset... 2 Set of known PPIs... 3 Domain-domain interactions... 3 Mutual exclusivity of genomic alterations... 3 PPI Essentiality... 3 Protein-drug connectivity... 3 User Interface... 4 OncoPPi Network... 4 Interactive Hub protein bar graph... 5 The PPI hub panel... 5 Gene query panel... 6 Gene information... 6 Data export... 7 PPI Essentiality... 7 Therapeutic connectivity... 8 References... 9 1

Recently, we have developed a PPI high-throughput screening platform to detect PPIs between cancerassociated proteins in the context of cancer cells. Here, we present the OncoPPi Portal, an interactive web resource that allows investigators to access, manipulate and interpret a high-quality cancer-focused network of PPIs experimentally detected in cancer cell lines. To facilitate prioritization of PPIs for further biological studies, this resource combines network connectivity analysis, mutual exclusivity analysis of genomic alterations, cellular co-localization of interacting proteins, and domain-domain interactions. Estimates of PPI essentiality allow users to evaluate the functional impact of PPI disruption on cancer cell proliferation. Furthermore, connecting the OncoPPi network with FDA approved drugs and compounds in clinical trials enables discovery of new tumor dependencies to inform strategies to interrogate undruggable targets like tumor suppressors. The OncoPPi Portal serves as a resource for the cancer research community to facilitate discovery of cancer targets and therapeutic strategies. Datasets Protein-protein interaction dataset A set of 83 lung cancer-associated genes was curated based on the analysis of genomic alterations in lung cancer patients and literature searches (1). It includes the major oncogenes (e.g. EGFR, KRAS, BRAF, MYC, PIK3CA, ERBB2) and tumor suppressors (e.g. TP53, STK11, CDKN2A, SMARCA4, or RB1). In addition to major tumor drivers, the protein library is populated with key regulators of oncogenic pathways, such as 14-3-3, MAPK14 (p38), or Beclin1. Structurally, the various classes of proteins (such as transmembrane receptors, kinases, GTPases, transcription factors, or adaptor proteins) with different cellular localization are included in the library. The network of lung cancer-associated PPIs was generated based on the Time-Resolved Fluorescence Energy Transfer (TR-FRET) screening performed in lung cancer H1299 cells using the cell lysates derived from systematic, pairwise transfection of GST- and Venus-fusion expression vectors containing the genes included in the OncoPPi gene library. Each PPI was tested in triplicate, with both fusion tags along with the corresponding empty vector negative controls. The whole PPI screening was repeated in three independent experiments (1). The positive and negative experimental data for a total of 3,486 protein-protein pairs are available through the OncoPPi Portal. For each experiment the average TR-FRET signals for the PPI (SPPI G1V2, SPPI G2V1), GST empty vector control (SVec G1,G2), and Venus empty vector control (SVec V1,V2) were calculated over the triplicates for each of two tested fusions (GST, Venus). Then, for each fusion the Fold Over Control (FOC) values were calculated with the following equation: (SPPI G1V2 ) FOC = Max( Max(SVec G1, SVec V2 ), (SPPI G2V1 ) Max(SVec G2, SVec V1 ) ) 2

The maximum of the FOC values obtained for the two fusions (GST, Venus) was considered as the final FOC value for a given PPI in a given experiment. Then, the average FOC values were calculated by averaging the FOC values for individual experiments. These values were considered as the final FOC value for a given PPI. A statistical significance of positive PPIs is estimated in terms of p-values calculated with the permutation test performed according to the following procedure. First, for a given PPI pair, all PPI signals and control signals obtained in all experiments were ranked. The sum of the ranks of the PPI signals was calculated and was used as the test statistic for the permutation test. The permutation tests were repeated 10,000 times. The p-value was calculated with the following equation: p=(ns+1)/10,001, where Ns is the number of cases where total ranks of shuffled labels exceed or are equal to that of true label. The p-values adjusted for the multiple comparisons (q-values) were calculated with the Benjamini-Hochberg procedure. Set of known PPIs PPI databases included: String (2), BioGrid(3), Intact (4) and GeneMania (5) datasets were used to identify PPIs with reported physical association. Domain-domain interactions Structural domains for the proteins were extracted from Pfam (6). Data for domains involved in physical associations in co-crystallized proteins was extracted from the 3DID database (7). Mutual exclusivity of genomic alterations Mutual exclusivity analysis was performed in MatLab with the Cancer Genomics Data Server tool box (8, 9). The complete tumor samples from lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) TCGA Provisional datasets (downloaded 1/2016) were used to analyze the mutual exclusivity of genomic alterations in lung cancer patient samples. Mutations, DNA amplifications and deletions were taken into account. Mutual exclusivity was evaluated in terms of log odds ratio (OD) values calculated as described previously (9). The alterations of two genes were considered as mutually exclusive if Log(OD) < 0. PPI Essentiality The PPI essentiality values for over 7,900 PPIs in 206 single cell lines were calculated using the MEDICI algorithm as described by Harati et al. (2). 1,548 proteins are available for the analysis. If multiple cell lines selected, the averaged essentiality values are calculated for the given set of cell lines. Protein-drug connectivity The OncoPPi network was integrated with the FDA-approved drugs. The drug names and corresponding target proteins were extracted from the Drug Bank (3) 3

User Interface OncoPPi Network The PPI network for a set of selected genes is shown in the PPI network window. The network is visualized using a Cytoscape plug-in (4) that allows a user to zoom it in or out as well as to rearrange (move) the nodes. A number of preset network layouts is available through the Settings panel. The interacting proteins are connected by grey lines. In the case of mutual exclusivity of corresponding genes in lung cancer the lines are highlighted with blue color. The Statistical properties panel allows to adjust the values of thresholds for statistical significance, including: Fold Over Control (FOC): The FOC 1.2 is considered as a minimum cut-off for positive PPI in original TR-FRET-based PPI screening; FOC 1.5 is used to define high-confident PPIs. The permutation test p-value: p-values 0.05 are considered as statistically significant The q-value (p-value, adjusted for multiple comparisons): q-values 0.01 are used as a threshold for high-confident PPIs. The panel also provides a set of PPI properties to filter the data, including: PPIs reported by other groups: if checked (default) show PPIs identified in String, BioGrid, IntAct, or GeneMania databases. If unchecked, the PPIs detected with the Emory PPI TR-FRET screening are shown. PPIs with mutual exclusivity (ME): if unchecked (default) all PPIs included in the dataset are shown. If checked, only the PPIs between the genes with genomic alterations mutually exclusive in lung adenocarcinoma or lung squamous cell carcinoma are shown. PPIs for the genes with ME are indicated with blue lines. Domain-domain interactions (DDI): if unchecked (default) all PPIs included in the dataset are shown. If checked, only the PPIs that share structural domains known to interact in crystallized protein-protein complexes (based on 3DID database) are shown. 4

Co-localized proteins: if unchecked (default) all PPIs included in the dataset are shown. If checked, only the PPIs for proteins that can co-localize to the same cellular compartments are shown. Interactive Hub protein bar graph is shown under the PPI network. The X-axis of the graph shows the proteins included in the given network, and the Y-axis shows the number of protein binding partners (node degree) identified with the selected statistical parameters. The protein order and data shown on the graph changes dynamically when the network and statistical thresholds are changed (e.g. FOC or p- values). Furthermore, all the binding partners of a hub protein and their interconnections are shown by clicking on the corresponding bar on the graph. The graphical representation of protein connectivity allows immediate visualization and ranking of the major hub proteins, and identification of their binding partners for further analysis. The bar graph shows a distribution of protein degree in the current network. By clicking a bar a user can visualize a hub network for the corresponding protein. The PPI hub panel on the right side contains a list of all proteins available for the current version of the network, including the OncoPPi v1 set defined in ref. 1. On a left-click on a hub protein a sub-network constructed for the corresponding protein and its binding partners will be shown. By left-clicking a protein node in the network the interactions for that particular protein will be highlighted. 5

Gene query panel at the bottom allows a search for PPIs within specified group of genes. The HUGO gene names separated by a space, comma, or new line can be used to query the dataset. Search button will start the search. Clear button will clear the query. Gene information window can be accessed with a right-click on the corresponding node in the network. The gene info includes the gene name, the number of PPIs, and a list of the binding partners. Each gene is also annotated with the cellular localization data extracted from Gene Ontology (GO) annotations, structural domains, and with the percent of cases with gene amplification, deletion, or mutation in lung adenocarcinoma (based on TCGA LUAD provisional dataset). The percent of genomic alterations are also visualized with red (amplification), blue (deletion), and green (mutation) sectors inside the gene node circle. To simplify the data mining of structural and biological information associated with the proteins, each protein in the dataset is linked to external databases such as the CTD2 Dashboard, cbioportal, TumorPortal, HGCN, Gene, Ensembl, UniProt, PubMed, and PubChem. The PPIs reported in external databases can be easily extracted from the IntAct, String, and GeneMania databases through the direct links to the corresponding protein pages. 6

Data export. The data associated with selected (shown) PPIs can be exported as a comma separated file (csv) by clicking on the Export PPI data icon. The data includes statistical characteristics of PPI, namely the FOC, p-value, and q-value, indication of mutual exclusivity of alterations of corresponding genes in lung adenocarcinoma or lung squamous carcinoma, predicted interacting domains, co-localized proteins, as well as the indication whether a PPI was reported in String, BioGrid, IntAct, or GeneMania databases. PPI Essentiality The PPI Essentiality page provides a graphical user interface for the MEDICI algorithm designed to predict the role of individual PPIs in cancer cell death and survival (2). The essentiality values range from 0 (less essential) to 1 (most essential PPIs). Accordingly, the subsets of PPIs with a desired level of essentiality can be visualized by adjusting the essentiality threshold in the Statistical Properties panel. The essentiality values can be calculated either for all 206 available cell lines or for a specific set of cell lines as well as for the individual cell lines. The cell lines can be filtered based on the corresponding tumor type or through the Search option. Similar to the OncoPPi Network page, all proteins included in the dataset are listed in the PPI Hubs panel. In addition, the proteins can be filtered based on their involvement in defined 196 biological pathways identified by the Pathway Interaction Database (5). 7

Therapeutic connectivity Therapeutic connectivity page of the OncoPPi Portal provides the interface to explore the connectivity between the FDA approved drugs and PPIs included in the OncoPPi network. The tumor suppressor genes are colored in green, while the oncogenes are highlighted in red. The drugs are shown as yellow triangles. The proteins are connected by blue lines, and the protein-drug connections are highlighted with red lines. All proteins included in the network are listed in the PPI Hub panel. The Gene query and Drug query panels allow a user to search for a subset of proteins and drugs included in the network. 8

References 1. Li Z, et al. (2017) The OncoPPi network of cancer-focused protein-protein interactions to inform biological insights and therapeutic strategies. Nature Communications 8:14356, 14310.11038/ncomms14356. 2. Harati S, et al. (2017) MEDICI: Mining Essentiality Data to Identify Critical Interactions for Cancer Drug Target Discovery and Development. PloS one 12(1):e0170339. 3. Wishart DS, et al. (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic acids research 34(Database issue):d668-672. 4. Franz M, et al. (2016) Cytoscape.js: a graph theory library for visualisation and analysis. Bioinformatics (Oxford, England) 32(2):309-311. 5. Schaefer CF, et al. (2009) PID: the Pathway Interaction Database. Nucleic acids research 37(Database issue):d674-679. 9