Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend

Similar documents
OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Review (V1) - The Hallmarks of Cancer

Supplementary Information

Review (V1) - The Hallmarks of Cancer

Journal: Nature Methods

A homologous mapping method for threedimensional reconstruction of protein networks reveals disease-associated mutations

A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data

VL Network Analysis ( ) SS2016 Week 3

Table S1: Kinetic parameters of drug and substrate binding to wild type and HIV-1 protease variants. Data adapted from Ref. 6 in main text.

CSE 255 Assignment 9

Contents. 2 Statistics Static reference method Sampling reference set Statistics Sampling Types...

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Bioinformatics Laboratory Exercise

Transient β-hairpin Formation in α-synuclein Monomer Revealed by Coarse-grained Molecular Dynamics Simulation

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

Hands-On Ten The BRCA1 Gene and Protein

Nature Methods: doi: /nmeth.3115

CRY2 binding to CIB1N w/ MTHF

From Mosquitos to Humans: Genetic evolution of Zika Virus

Nature Neuroscience: doi: /nn Supplementary Figure 1

Genomic analysis of essentiality within protein networks

Supplementary Materials for

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Figure S1. (A) SDS-PAGE separation of GST-fusion proteins purified from E.coli BL21 strain is shown. An equal amount of GST-tag control, LRRK2 LRR

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Supplementary Information A Hydrophobic Barrier Deep Within the Inner Pore of the TWIK-1 K2P Potassium Channel Aryal et al.

(B D) Three views of the final refined 2Fo-Fc electron density map of the Vpr (red)-ung2 (green) interacting region, contoured at 1.4σ.

Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance *

Use of MALDI-TOF mass spectrometry and machine learning to detect the adulteration of extra virgin olive oils

Bayesian (Belief) Network Models,

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Identification of Lung Cancer Related Genes Using Enhanced Floyd Warshall Algorithm in a Protein to Protein Interaction Network

Chapter 1. Introduction

Predicting disease associations via biological network analysis

Gene finding. kuobin/

PBZ FT01_PBZ FT01_TZ FT01_NZ. interface zone (I) tumor zone (TZ) necrotic zone (NZ)

Supplementary Figure 1 (previous page). EM analysis of full-length GCGR. (a) Exemplary tilt pair images of the GCGR mab23 complex acquired for Random

SEQUENCE FEATURE VARIANT TYPES

Identification of Tissue Independent Cancer Driver Genes

Learning Convolutional Neural Networks for Graphs

Gene Ontology 2 Function/Pathway Enrichment. Biol4559 Thurs, April 12, 2018 Bill Pearson Pinn 6-057

Supplement to SCnorm: robust normalization of single-cell RNA-seq data

SUPPLEMENTARY APPENDIX

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

Numerous hypothesis tests were performed in this study. To reduce the false positive due to

Research Article RRHGE: A Novel Approach to Classify the Estrogen Receptor Based Breast Cancer Subtypes

Bioinformation by Biomedical Informatics Publishing Group

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Supplementary Figure 1. Using DNA barcode-labeled MHC multimers to generate TCR fingerprints

Figure S1. Standard curves for amino acid bioassays. (A) The standard curve for leucine concentration versus OD600 values for L. casei.

MIR retrotransposon sequences provide insulators to the human genome

Numerous hypothesis tests were performed in this study. To reduce the false positive due to

User Guide. Association analysis. Input

Cancer Informatics Lecture

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Data mining with Ensembl Biomart. Stéphanie Le Gras

Mapping Evolutionary Pathways of HIV-1 Drug Resistance. Christopher Lee, UCLA Dept. of Chemistry & Biochemistry

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Proteins: Proteomics & Protein-Protein Interactions Part I

Supporting Information

Nature Immunology: doi: /ni Supplementary Figure 1. RNA-Seq analysis of CD8 + TILs and N-TILs.

Theta sequences are essential for internally generated hippocampal firing fields.

Gene-microRNA network module analysis for ovarian cancer

Supplementary Figures

EXPression ANalyzer and DisplayER

Refining Literature Curated Protein Interactions using Expert Opinions

Network-assisted data analysis

Abstract. Introduction

Nature Biotechnology: doi: /nbt.1904

SUPPLEMENTARY INFORMATION In format provided by Javier DeFelipe et al. (MARCH 2013)

J. A. Mayfield et al. FIGURE S1. Methionine Salvage. Methylthioadenosine. Methionine. AdoMet. Folate Biosynthesis. Methylation SAH.

Self reported ethnicity

Chapter 8. Interaction between the phosphatidylinositol 3- kinase SH3 domain and a photocleavable cyclic peptide

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.

SUPPLEMENTARY FIGURES: Supplementary Figure 1

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

Supplementary materials

Gene Finding in Eukaryotes

Supporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Package bionetdata. R topics documented: February 19, Type Package Title Biological and chemical data networks Version 1.0.

To test the possible source of the HBV infection outside the study family, we searched the Genbank

Comparison of discrimination methods for the classification of tumors using gene expression data

Nature Genetics: doi: /ng Supplementary Figure 1

Supplementary Figures. Supplementary Figure 1. Treatment schematic of SIV infection and ARV and PP therapies.

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

SUPPLEMENTARY INFORMATION

MODULE 3: TRANSCRIPTION PART II

DAY 98 UNDERSTANDING REASONABLE DOMAIN

Data and text mining applied to the computational study of protein interaction networks

Unsupervised Identification of Isotope-Labeled Peptides

Lecture Readings. Vesicular Trafficking, Secretory Pathway, HIV Assembly and Exit from Cell

Prediction of Alternative Splice Sites in Human Genes

Topological properties of the structural brain network constructed using the ε-neighbor method

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

SUPPLEMENTAL MATERIAL

Nature Genetics: doi: /ng Supplementary Figure 1. Workflow of CDR3 sequence assembly from RNA-seq data.

Transcription:

Table of content -Supplementary methods -Figure S1 -Figure S2 -Figure S3 -Table legend

Supplementary methods Yeast two-hybrid bait basal transactivation test Because bait constructs sometimes self-transactivate reporter genes, SD-L-H culture medium were supplemented with 3-aminotriazole (3-AT). Appropriate concentrations of this drug were determined by growing bait strains on SD-L-H medium supplemented with increasing concentrations of 3-AT. Self-transactivation by NS5A without its membrane anchor was too high to be titrated with 3-AT and was not further tested. IST (Interactor Sequence Tag) Analysis We have developed a bioinformatic pipeline that assigns each IST to its native human genome transcript. First, ISTs were filtered by using PHRED (Ewing and Green, 1998; Ewing et al., 1998) at a quality score superior to the conventional 20 threshold value (less than 1% sequence errors). Gal4 motif was searched (last 87 bases of GAL4-AD), sequences downstream of this motif were translated into peptides and aligned using BLASTP against the REFSEQ (http://www.ncbi.nlm.nih.gov/refseq/) human protein sequence database (release 04/2007). Low-confidence alignments (E value >10-10, identity < 80%) and premature STOP codon containing sequences were eliminated. Only in-frame proteins and high quality sequences were further considered Network visualization Large Graph Layout (http://bioinformatics.icmb.utexas.edu/lgl/) was applied to visualize the H-H network in figure 2A. Guess tool (http://graphexploration.cond.org/) was used to graphically represent H-H HCV infection network in figure 2B and the IJT network in figure 4A. Figure 2b and 4a are available in a GUESS interactive format (GUESS Data Format) in 1

SF1.tar.gz (infection_network.properties and ijt_network.properties) available at http://vinavratil.free.fr/lotteau_hcv/sf1.tar.gz. Definition of graph theoretic concepts Connected component: In an undirected network, a connected component is a maximal connected sub-network. Two nodes are in the same connected component if and only if there exist a path between them. We also included in the set of connected components those proteins that are not connected to any other protein (disconnected proteins), according to igraph R package. Assume G = (V,E), the graph representing the human interactome where V is the set of nodes (proteins) and E is the set of edges (interactions) between this nodes. We use n and m to denote the number of nodes and edges of G, respectively. Degree: The degree of a node (k) is the number of edges incident to the node. The degree distribution of human proteins was compared to the degree distribution of H HCV. Shortest Path: The shortest path problem is the finding of a path between two nodes such that the sum of the weights of its constituent edges is minimized. The shortest paths (also called geodesics) are calculated here by using breath-first search in the graph. Edge weights are not used here, i.e every edge weight is one. The shortest path length (l) distribution of H HCV was compared to the shortest path length distribution of the human interactome. Betweenness: The node betweenness b(v) can be defined by the number of shortest paths going through a node v and was normalised by twice the total number of protein pairs in the network (n*(n- 1)). The equation used to compute betweenness centrality, b(v), for a node v is: 2

b(v) = 1 n x (n-1) x Σ i,j,v V g ij (v) g ij i j v Where g ij is the number of shortest paths going from node i to j, I and j V and g ij (v) the number of shortest paths from i to j that pass through the node v. The betweenness distribution of H HCV was compared to the betweenness distribution of the human interactome. Generalised linear model The generalized linear model was constructed as follow: Y = a0 + a1*k + a2*b to test the effect of both predictor measures (k and b, the independent variable) on the property of cellular protein to interact with viral proteins described by Y, the variable to predict. In this logistic regression model: Y is assumed to follows a binomial law. Indeed, as the response data Y are binary, i.e. cellular proteins interact (equal to 1) or not (equal to 0) with viral proteins, it is generally assumed that Y follows a binomial distribution of parameters n and p, where the numbers of Bernoulli trials n and the probabilities of success, i.e. that the cellular protein interact with viral proteins, p; The logistic regression is based on the fundamental hypothesis that Y = a0 + a1*k + a2*b = log(p/1-p), also called the logistic function; k is the degree of the cellular proteins; b is the betweenness of the cellular proteins. 3

IJT network construction First, all proteins annotated in insulin (blue), Jak/STAT (red) and TGFβ (green) pathways were extracted from KEGG database. Second, a network was constructed by connecting all these proteins according to the H-H interactome dataset to create the backbone of the IJT network (colored network). Then, direct neighbors of each member of the three pathways were extracted from the H-H interactome. From this pool of proteins, only those interacting with viral proteins were retained and connected to the IJT network (black and grey nodes). The rationale for this selection is that targeted neighbors at the interface could be coregulators or perturbators of these pathways. 4

Supplementary references Ewing, B., and Green, P. (1998). Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res, 8, 186-194. Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998). Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res, 8, 175-185. 5

Figure S1 1 342 9374 9605 HCV 1b genome (strain Con1, AJ238799) HCV 1b polyprotein 3010 aa CORE 1 2 3 4 Full length protein NS5A membrane anchor Domain E1 E2 p7 NS2 NS3 NS4A NS4B NS5A NS5B 5 7 9 11 13 16 17 18 24 6 8 10 12 19 25 14 21 15 20 22 26 23 27 Functional fusion ORF Protein/domain Genome coordinates polyprotein coordinates 1 Core full-length 342 914 1 191 2 Core mature 342 854 1 171 3 Core domain 1 342 695 1 118 4 Core domain 2 696 854 119 171 5 E1 full-length 915 1490 192 383 6 E1 ectodomain 915 1397 192 352 7 E2 full-length 1491 2579 384 746 8 E2 ectodomain 1491 2465 384 708 9 p7 full-length 2580 2768 747 809 10 p7 N-term helix 2580 2633 747 764 11 NS2 full-length 2769 3419 810 1026 12 NS2 ectodomain 3048 3419 903 1026 13 NS3 full-length 3420 5312 1027 1657 14 NS3 protease 3420 3959 1027 1206 15 NS3 helicase 3960 5312 1207 1657 16 NS4A full-length 5313 5474 1658 1711 17 NS4B full-length 5475 6257 1712 1972 18 NS5A full-length 6258 7598 1973 2419 19 NS5A cytosolic domain 6344 7598 2003 2419 20 NS5A membrane anchor 6258 6347 1973 2002 21 NS5A domain 1 6344 6896 2003 2185 22 NS5A domain 2 6897 7283 2186 2314 23 NS5A domain 3 7284 7598 2315 2419 24 NS5B full-length 7599 9374 2420 3010 25 NS5B cytosolic domain 7599 9308 2420 2989 26 NS4A-NS3 full-length 3420 5312 1027 1657 +NS4A cofactor 5367 5411 1676 1690 27 NS4A-NS3 protease 3420 3959 1027 1657 +NS4A cofactor 5367 5411 1676 1690

Figure S1: The HCV ORFeome. Schematic representation of the HCV genome and definition of ORFs designed for Y2H screens. The HCV positive strand genome (purple) encodes a polyprotein (orange) which is co-translationally processed in 10 proteins. Red: full length protein; blue: domain; yellow + green: NS4A + NS3 chimeric fusion; pink: NS5A membrane anchor. Genome and polyprotein coordinates of each construction are given in the table.

Figure S2

Figure S2: Degree and betweenness distributions of H, H HCV and H EBV proteins in H-H network. log degree (left) and log betweenness (right) distributions of H proteins (blue), H HCV (red) and H EBV (green). Solid lines represent linear regression fits. Vertical dashed lines give mean degree and betweenness values.

Figure S3 % adhesion 120 100 80 60 40 20 0 0,01 0,1 1 10 100 µg/ml fibronectin % adhesion 140 NS2 NS3 NS5A NS3/4A MOCK 120 100 80 60 40 20 0 0,01 0,1 1 10 100 µg/ml Poly-L-Lysine

Figure S3: Functional validation of focal adhesion perturbation by NS3 and NS5A 96-well plates were coated with fi bronectin (left) or poly-l-lysine (right) at various concentrations. 293T cells expressing NS2, NS3, NS3/4A or NS5A were plated on the matrix for 30min (concentration range: 0,04 to 20 g/ml matrix). Adherent cells we re stained with crystal violet. Values represent means of triplicates with standard deviation.

Supplementary Table legends Table SI: H HCV listing. HCV proteins are referenced according to their NCBI mature peptide product name (column 1). Human proteins are referenced with their cognate NCBI gene name and gene ID (columns 2 and 3). The number of IST for IMAP Y2H (IMAP1 and IMAP2, according to the method of screening) is given in columns 4 and 5. IMAP LCI (Literature Curated Interactions) from textmining and BIND database associated PubMed IDs are given in columns 6 and 7. Co-affinity purification (CoAP) or Y2H pairwise matrices validations are indicated in columns 8 and 9 (+: IMAP validation, -: not validated, NA: non assayable due to default of protein expression or to cellular protein directly interacting with GST). Links for PubMed IDs and Gene IDs are given at http://www.ncbi.nlm.nih.gov/sites/entrez (choose PubMed or Gene respectively). Gene expression in the liver, according to EST data (http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene) is given in column 10. Table SII: characteristics of IMAP1 and IMAP2 screens Numbers of transformants and yeast colonies are given for IMAP1 screens for each bait. Number of diploids and yeast colonies are given for IMAP2 screens for each bait. Table SIII: Listing of human proteins interacting with more than one viral protein. Human proteins are referenced with their cognate NCBI gene name (column 1). HCV proteins are referenced according to their NCBI mature peptide product name (column 2). Origin of the dataset (IMAP Y2H, IMAP LCI, column 3). Table SIV: List of human protein- protein interactions

Interacting human proteins are referenced with their cognate NCBI gene name (columns 1 and 2). These physical and direct binary protein-protein interactions were retrieved from BIND, BioGRID, DIP, GeneRIF, HPRD, IntAct, MINT, and Reactome databases. Table SV: Topological analyis of the HCV-human network Connected components of the HCV-Human network. The size of the largest component and the number of connected components of V-H HCV subnetwork were computed (IMAP dataset, column 2). In order to test the significance of observed values, we computed the mean of the largest component size and the mean number of connected components obtained after 1000 simulations of random sub-networks (IMAP Sim column 3). The differences between the observed and the simulated mean values were highly significant (two sided z-test ***: p-value <10-10 ). Topological properties of the H HCV and H EBV proteins in the human interactome. Full interactome (A), high-confidence interactome (B, containing only PPIs with at least two PMIDs or validated by two different methods). The number of proteins and PPIs that can be integrated into the human interactome are given for H HCV and H EBV. Percentage of H HCV and H EBV that are present in the human interactome are given according to the origin of the dataset. Average degree (k), betweenness (b) and shortest path (l) were computed for H HCV and H EBV in both full and high-confidence interactomes (Calderwood et al., 2007). Table SVI: HCV Protein distribution and enrichment in IJT network. A. H HCV enrichment in IJT network for each viral protein. Number of H HCV is given in V- H HCV and IJT networks. Enrichment of H HCV in IJT network was tested with exact Fisher test for each viral protein. Associated odd ratios and p-values are given.

B. H HCV enrichment in Jak/STAT, TGFβ and Insulin pathways for each viral protein. Number of H HCV is given in V-H HCV network and Jak/STAT, TGFβ and Insulin pathways (as defined in KEGG database). Enrichment of H HCV in Jak/STAT, TGFβ and Insulin pathways was tested with exact Fisher test for each viral protein. Associated odd ratios and p-values are given. C. H HCV enrichment in Jak/STAT, TGFβ and Insulin inter-pathways for each viral protein. Inter-pathways are defined as the H HCV connecting two or three KEGG pathways. Number of H HCV is given in V-H HCV network and Jak/STAT, TGFβ and Insulin inter-pathways. Enrichment of H HCV in Jak/STAT, TGFβ and Insulin inter-pathways was tested with exact Fisher test for each viral protein. Associated odd ratios and p-values are given.