I. Setup. - Note that: autohgpec_v1.0 can work on Windows, Ubuntu and Mac OS.

Similar documents
Duc-Hau Le 1,2 and Van-Huy Pham 3*

Supplementary Figure 1

Clay Tablet Connector for hybris. User Guide. Version 1.5.0

Data mining with Ensembl Biomart. Stéphanie Le Gras

Evaluating Classifiers for Disease Gene Discovery

P-B-54.30/141. Instrument Cluster SCN Coding for Component Replacement or Dealer Installed Accessories:

Exercises: Differential Methylation

Hands-On Ten The BRCA1 Gene and Protein

Fully Automated IFA Processor LIS User Manual

Managing and Taking Notes

The application can be accessed from the pull down menu by selecting ODOT > Drafting Apps > Signs, or by the following key-in command:

TMWSuite. DAT Interactive interface

CSDplotter user guide Klas H. Pettersen

Managing and Taking Notes

NeuroLink by Applied Neuroscience, Inc. Help Manual Applied Neuroscience, Inc. Applied Neuroscience, Inc.

Module 3: Pathway and Drug Development

User Guide. Association analysis. Input

Chapter 9. Tests, Procedures, and Diagnosis Codes The McGraw-Hill Companies, Inc. All rights reserved.

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Phenotype analysis in humans using OMIM

Qualys PC/SCAP Auditor

Vega: Variational Segmentation for Copy Number Detection

Lionbridge Connector for Hybris. User Guide

Pathway Exercises Metabolism and Pathways

ImageJ plugin for semiautomatic measurement of roentgenological attachment level

IBRIDGE 1.0 USER MANUAL

Quick guide for 3shape order form

CNV PCA Search Tutorial

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1

Dosimeter Setting Device

Set Up SOS Video Chat and Screen-Sharing

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Cardiac Agatston Scoring

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

NERVE ACTION POTENTIAL SIMULATION version 2013 John Cornell

BlueBayCT - Warfarin User Guide

Modularized Random Walk with Restart for Candidate Disease Genes Prioritization

A framework for the study of diseases and adverse drug reactions

NMR. Sample preparation. and Analysis

Appendix B. Nodulus Observer XT Instructional Guide. 1. Setting up your project p. 2. a. Observation p. 2. b. Subjects, behaviors and coding p.

USER GUIDE: NEW CIR APP. Technician User Guide

Literature databases OMIM

Instructions for the ECN201 Project on Least-Cost Nutritionally-Adequate Diets

DENTRIX ENTERPRISE 8.0.5

Allergy Basics. This handout describes the process for adding and removing allergies from a patient s chart.

Guide to Use of SimulConsult s Phenome Software

LabVIEW PROFIBUS VISA Driver DP-Master

University of Alaska Connected! FAQs

OECD QSAR Toolbox v.4.2. An example illustrating RAAF scenario 6 and related assessment elements

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

CURRICULUM VITA OF Xiaowen Chen

Titrations in Cytobank

Demo Mode. Once you have taken the time to navigate your RPM 2 app in "Demo mode" you should be ready to pair, connect, and try your inserts.

Getting Started.

Exercise Pro Getting Started Guide

Unitron Remote Plus app

A Quick-Start Guide for rseqdiff

Content Part 2 Users manual... 4

Pyxis MedStation System. Guide for Managing Patient-Specific Medication

DICOM Conformance Statement

Hanwell Instruments Ltd. Instruction Manual

Managing Immunizations

Network-assisted data analysis

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

Medtech32 Diabetes Get Checked II Advanced Form Release Notes

QuantiPhi for RL78 and MICON Racing RL78

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Bowel Cancer Screening for Scotland

Sleep Apnea Therapy Software Clinician Manual

LabVIEW Profibus VISA Driver DP-Slave

IMPaLA tutorial.

Cortex Gateway 2.0. Administrator Guide. September Document Version C

SmartVA-Analyze 2.0 Help

Section D. Identification of serotype-specific amino acid positions in DENV NS1. Objective

Installing and Testing JMonkeyEngine (jme)

Principles of phylogenetic analysis

Table of Contents Index Next. See inside for a complete description of program functions >> Link to the Table of Contents >> Link to the Index

15.053x. OpenSolver (

VMA Demo Unit. Introduction. This document provides information on how to set up and operate the VMA Demo Unit. Figure 1: VMA Demo Unit

JEFIT ios Manual Version 1.0 USER MANUAL. JEFIT Workout App Version 1.0 ios Device

Sleep Apnea Therapy Software User Manual

GridMAT-MD: A Grid-based Membrane Analysis Tool for use with Molecular Dynamics

EHS QUICKSTART GUIDE RTLAB / CPU SECTION EFPGASIM TOOLBOX.

Anticoagulation Manager - Getting Started

Software Version 2.0. User s Guide

AudioConsole. User Guide. Doc. No EN/01 Part No EN

Diabetes Management Software V1.3 USER S MANUAL

ATLANTIS WebOrder. ATLANTIS ISUS User guide

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

VMMC Installation Guide (Windows NT) Version 2.0

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Preparations. Planmeca Romexis Smile Design Quick guide. Capture 2D photo(s) Start Romexis Smile Design software

PSSV User Manual (V2.1)

Steps to Creating a New Workout Program

LabVIEW PROFIBUS VISA Driver DP-Slave

Lipid annotation with MS2Analyzer. Yan Ma 10/24/2013

OMIM The Online Mendelian Inheritance in Man Knowledgebase: A Wardrobe Full of Genes. Ada Hamosh, MD, MPH

CUCM Mixed Mode with Tokenless CTL

SFARI Gene 2.0 User Guide

Transcription:

autohgpec: Automated prediction of novel disease-gene and diseasedisease associations and evidence collection based on a random walk on heterogeneous network Duc-Hau Le 1,*, Trang T.H. Tran 1 1 School of Computer Science and Engineering, Thuyloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam. * To whom correspondence should be addressed. User Manual 1

Table of Contents I. Setup... 3 II. Overview of autohgpec... 4 III. Case study: Prediction of novel breast cancer-associated genes and diseases... 5 1. Run autohgpec in Cytoscape... 5 Step 1: Construct a heterogeneous network... 5 Step 2: Select a disease of interest... 5 Step 3: Select candidate sets... 6 Step 4: Prioritize... 6 Step 5: Examine ranked genes and diseases... 7 Visualization... 7 Search Evidences... 10 2. Automate autohgpec using CyREST Command API... 11 Step 1: Construct a heterogeneous network... 11 Step 2: Select a disease of interest... 12 Step 3: Select candidate sets... 15 Step 4: Prioritize... 17 Step 5: Examine ranked genes and diseases... 19 Visualize... 19 Search Evidences... 20 3. Automate autohgpec from R... 20 Step 1: Construct a heterogeneous network... 21 Step 2: Select a disease of interest... 21 Step 3: Select candidate sets... 21 Step 4: Prioritize... 22 Step 5: Examine ranked genes and diseases... 23 Visualize... 23 Search Evidences... 23 IV. Reference... 24 2

I. Setup - autohgpec 1.0 can only run on Cytoscape 3.6 (or later) platform, which have Automation features, therefore user should download this version at http://cytoscape.org/ - Cytoscape need JRE to run, therefore download JRE version 7.x or later from http://www.oracle.com/technetwork/java/index.html and install it. - Install Cytoscape to the root folder (e.g., /Applications/Cytoscape_v3.6.0). - Download autohgpec_v1.0.jar file from http://hgpec.sourceforge.net/ or https://sites.google.com/site/duchaule2011/bioinformatics-tools/autohgpec. Then, install it by going to Apps à App Manager. After that, choose Install from file, then select the downloaded autohgpec_v1.0.jar file. - Create folders Data in the root folder of Cytoscape (e.g., /Applications/Cytoscape_v3.6.0). - Download GO annotation data at ftp.ncbi.nlm.nih.gov/gene/data/gene2go.gz, then extract and store in the Data folder (e.g., /Applications/Cytoscape_v3.6.0/Data). - Note that: autohgpec_v1.0 can work on Windows, Ubuntu and Mac OS. 3

II. Overview of autohgpec After installing, autohgpec will be automatically loaded in the App menu of Cytoscape The main tasks (Prediction of Genes and Diseases, and Evidence Collection) of autohgpec are completed after five steps: - Step 1: Construct a Heterogeneous network - Step 2: Select a disease of interest (including 2 sub steps) o 1. Select a disease o 2. Create training list - Step 3: Provide Candidate Gene Set (including 4 options) o All remaining genes in the Gene network o Neighbors of training genes in Chromosome o Neighbors of training genes in Gene network o Susceptible Chromosome Regions/Bands - Step 4: Prioritize (candidate genes and diseases) - Step 5: Examine Ranked Genes and Diseases o Search Evidences o Visualize These five steps can be performed - In Cytoscape like HGPEC (Le and Pham, 2017) - Using CyREST Command API - From R statistics (https://www.r-project.org) 4

III. Case study: Prediction of novel breast cancer-associated genes and diseases In the following section, we show the ability of autohpec in identifying novel breast cancer-associated genes and diseases. 1. Run autohgpec in Cytoscape Step 1: Construct a heterogeneous network To this end, we select a phenotypic disease similarity network containing 5,080 diseases and 19,729 interactions (i.e., Disease_Similarity_Network_5) and a human protein interaction network containing 10,486 genes and 50,791 interactions (i.e., Default_Human_PPI_Network). Then, we connect them by known disease-gene associations from either OMIM (Amberger, et al., 2009) or DisGeNET (Piñero, et al., 2017) to construct a heterogeneous network of diseases and genes by clicking Apps à autohgpec à Step 1: Construct a Heterogeneous Network To construct a heterogeneous network: 1. Select a disease similarity network. 2. Select known disease-gene associations 3. Select a network of genes/proteins (e.g., the preinstalled one or one imported from Cytoscape). 4. Click OK to connect these two networks by the known disease-gene associations. Note that: - For disease similarity network: We pre-installed 3 networks corresponding to 5, 10 or 15 nearest neighbors, which were extracted from a phenotypic disease similarity matrix data collected from (van Driel, et al., 2006) - For gene/protein interaction network: o We pre-installed a human physical protein interaction network collected from ftp://ftp.ncbi.nlm.nih.gov/gene/generif/interactions.gz. o However, user can use other protein/gene interaction networks by importing them to Cytoscape (File à Import à Network from table (Text/MS Excel) ). Genes/Proteins in the network must be identified by Gene Entrez ID. - For known disease-gene associations: User can select from either OMIM or DisGeNET Step 2: Select a disease of interest We select breast cancer (OMIM ID: 114480), then create training list by click menu Apps à autohgpec à Step 2: Select a disease of interest - Step 2.1. Select a disease: Apps à autohgpec à Step 2: Select a disease of interest à 1. Select a disease Enter disease keyword to retrieve a list of disease phenotypes from OMIM. Here are 4 phenotypes from OMIM related to breast cancer - Step 2.2. Create Training List: Apps à autohgpec à Step 2: Select a disease of interest à 2. Create Training List 5

Here are the training lists include the disease of interest (OMIM ID: 114480) and its 21 known associated genes. The disease of interest (OMIM ID: 114480) A total of 21 known associated genes Step 3: Select candidate sets For candidate diseases, all remaining diseases are specified as candidate diseases by default. Therefore, there are 5,079 diseases in this set. For candidate genes, select menu Apps à autohgpec à Step 3: Provide Candidate Gene Set, then we select option All remaining genes in Gene Network. As a result, a total of 10,465 remaining genes were selected as candidate genes. With option All remaining genes in Gene Network, 10,465 remaining genes were selected as candidate genes Four ways to construct a candidate gene set: - Neighbors of Training Genes in Gene Network o User must define distance of neighbors to training genes - Neighbors Of Training Genes in Chromosome (also known as Artificial Linkage Interval) o User must define number of neighbors of each training gene in the same chromosome. - All remaining genes in Gene Network - Susceptible Chromosome Regions/Bands o User selects candidate genes from susceptible chromosome regions/bands. Step 4: Prioritize We set three parameters (i.e., back-probability (γ), jumping probability (l) and subnetwork (Disease/Gene) importance (h)) of RWRH algorithm to 0.5, 0.6 and 0.7, respectively. Please refer to (Li and Patra, 2010) for best parameter setting. Select menu Select menu Apps à autohgpec à Step 4: Prioritize 6

Then click OK to rank all candidate genes and diseases in the heterogeneous network. All genes and diseases are ranked and listed in two data tables Note that, not only candidate genes and diseases are ranked, but all genes and diseases in the heterogeneous network are also ranked. Therefore, user can visualize them in one view to exploit their topologically relationships. Ranked Genes Ranked Diseases Step 5: Examine ranked genes and diseases Visualize and search evidences for highly ranked genes and diseases shown in two above data can be done by selecting menu Apps à autohgpec à Step 5: Examine Ranked Genes and Diseases Visualization Not only candidate genes and diseases are ranked, but all genes and diseases in the heterogeneous network are also ranked. Therefore, user can visualize them in one view to exploit their topologically relationships. - Visualize the topological relationships between highly ranked candidate genes and the disease of interest. For example: If we focus on topological relationships between highly ranked candidate genes and disease of interest and its associated genes, we selected top 20 ranked candidate genes, 21 training genes as above and the training disease (i.e., OMIM ID: 114480) for visualization o Select top 20 ranked candidate genes and 21 training genes. o Select the training disease (i.e., the disease of interest, OMIM ID: 114480) 7

o Select sub-menu Apps à autohgpec à Step 5: Examine Ranked Genes and Diseases à 2. Visualize o Select Layout à Group Attributes Layout à Role Node in rhombus shape is the disease of interest. Nodes with high rankings are in red, relative high are in pink, medium are in white and light green, low are in green. We found that the sub-network is mostly connected. In other words, highly ranked genes are directly connected to known/training genes - Visualize the topological relationships between highly ranked candidate diseases and the disease of interest. In this case, we selected top 20 ranked candidate diseases, 21 training genes and the disease of interest (i.e., OMIM ID: 114480) for visualization. o Select top 21 ranked candidate genes o Select top 20 ranked candidate diseases and the disease of interest 8

o Select sub-menu Apps à autohgpec à Step 5: Examine Ranked Genes and Diseases à 2. Visualize o Select Layout à Group Attributes Layout à Role Node in rhombus shape is the disease of interest. Nodes in rectangle shape are candidate diseases. Nodes with high rankings are in red, relative high are in pink, medium are in white and light green, low are in green Similarly, we found that the sub-network is connected. In other words, highly ranked candidate diseases are directly connected to either known/training genes or the disease of interest. This means that candidate diseases which have connections to the disease of interest or associated with training genes are highly ranked. 9

Search Evidences This function is to collect evidences and annotations for associations between highly ranked candidate genes/diseases and the disease of interest. Ranked genes Select a set of 20 ranked candidate genes Then, select menu Apps à autohgpec à Step 5: Examine Ranked Genes and Diseases à 1. Search Evidences. Here are the genes with annotations and evidences Ranked diseases Select a set of top 20 ranked candidate diseases for annotation and evidence collection Then, select menu Apps à autohgpec à Step 5: Examine Ranked Genes and Diseases à 1. Search Evidences Here are the diseases with annotations and evidences 10

2. Automate autohgpec using CyREST Command API Select menu Help à Automation à CyREST Command API Here is list of commands to run autohgpec To predict novel breast cancer-associated genes and diseases, we need to perform 5 following steps: Step 1: Construct a heterogeneous network Use Example Value then press Try it out! to create a heterogeneous network of diseases and genes, including a phenotypic disease similarity network containing 5,080 diseases and 19,729 interactions (i.e., Disease_Similarity_Network_5), a human protein interaction network containing 10,486 genes and 50,791 interactions (i.e., Default_Human_PPI_Network) and known disease-gene associations from either OMIM (Amberger, et al., 2009). 11

Step 2: Select a disease of interest Input breast cancer by using Example Value then press Try it out! to retrieve a list of disease phenotypes from OMIM 12

Here are 4 phenotypes from OMIM related to breast cancer 13

Select OMIM ID: 114480 by using Example Value then pressing Try it out! to create training lists Retrieve a list of 21 training genes Here are the training lists include the disease of interest (OMIM ID: 114480) and its 21 known associated genes in Cytoscape. 14

The disease of interest (OMIM ID: 114480) A total of 21 known associated genes Step 3: Select candidate sets Four ways to construct a candidate gene set: Op Candidate Set t 1 Neighbors of Training Genes in Gene Network User must define distance of neighbors to training genes 2 Neighbors Of Training Genes in Chromosome (also known as Artificial Linkage Interval) User must define number of neighbors of each training gene in the same chromosome. 3 All remaining genes in Gene Network 4 Susceptible Chromosome Regions/Bands User selects candidate genes from susceptible chromosome regions/bands. For this case study, we selected option 3: 15

16

à A total of 10,465 remaining genes were selected as candidate genes in Cytoscape Step 4: Prioritize Use Example Value, then press Try it out! to rank all genes and disease phenotypes in the heterogeneous network 17

Ranked Genes in Cytoscape Ranked Diseases in Cytoscape 18

Step 5: Examine ranked genes and diseases Visualize - Select ranked genes and diseases in the two above data tables to visualize, then press Try it out! To visualize selected genes and diseases in the heterogeneous network. See the results in the section Visualize in Step 5 of Run autohgpec in Cytoscape 19

Search Evidences - Select highly ranked candidate genes and diseases in the two above data tables, then press Try it out! to search evidences See the results in the section Search Evidences in Step 5 of Run autohgpec in Cytoscape 3. Automate autohgpec from R Make sure appropriate libraries are installed and they are functional. - Please run check-library-installation.r for libs and tests: https://github.com/cytoscape/cytoscapeautomation/blob/master/for-scripters/r/check-library-installation.r - Please run check-cytoscape-connection-autohgpec.r for tests and initial demo: https://sites.google.com/site/duchaule2011/bioinformatics-tools/autohgpec - List available commands of autohgpec in R: > commandhelp('autohgpec') [1] "Available commands for 'autohgpec':" [1] "step1_construct_network" "step2_1_select_disease" "step2_2_create_training_list" "step3_pcg_allremaining" 20

[5] "step3_pcg_nbchromosome" "step3_pcg_nbnetwork" "step3_pcg_suscepchromo" "step4_prioritize" [9] "step5_1_search_evidences" "step5_2_visualize" To predict novel breast cancer-associated genes and diseases, we need to perform 5 following steps in R: Step 1: Construct a heterogeneous network Use command step1_construct_network to create a heterogeneous network of diseases and genes, including a phenotypic disease similarity network containing 5,080 diseases and 19,729 interactions (i.e., Disease_Similarity_Network_5), a human protein interaction network containing 10,486 genes and 50,791 interactions (i.e., Default_Human_PPI_Network) and known disease-gene associations from either OMIM (Amberger, et al., 2009). - List available arguments of command step1_construct_network of autohgpec in R > commandhelp('autohgpec step1_construct_network') [1] "Available arguments for 'autohgpec step1_construct_network':" [1] "DiseaseGene" "diseasenetwork" "genenetwork" - Run the command to build a heterogeneous network > commandrun('autohgpec step1_construct_network DiseaseGene="Disease-gene from OMIM" diseasenetwork="disease_similarity_network_5" genenetwork="default_human_ppi_network"') [1] "Build Heterogeneous Network successfully" Step 2: Select a disease of interest Step 2.1: Select a disease Use command step2_1_select_disease - List available arguments of command step2_1_select_disease of autohgpec in R > commandhelp('autohgpec step2_1_select_disease') [1] "Available arguments for 'autohgpec step2_1_select_disease':" [1] "diseasename" - Run the command to retrieve a list of disease phenotypes from OMIM. It will return 4 phenotypes from OMIM related to breast cancer > commandrun('autohgpec step2_1_select_disease diseasename="breast cancer"') Step 2.2: Create training lists Use command step2_2_create_training_list - List available arguments of command step2_2_create_training_list of autohgpec in R > commandhelp('autohgpec step2_2_create_training_list') [1] "Available arguments for 'autohgpec step2_2_create_training_list':" [1] "diseasetraining" - Run the command to retrieve a list of training genes and disease phenotypes (OMIM ID: 114480) > commandrun('autohgpec step2_2_create_training_list diseasetraining="mim114480"') Here are the training lists include the disease of interest (OMIM ID: 114480) and its 21 known associated genes. The disease of interest (OMIM ID: 114480) A total of 21 known associated genes Step 3: Select candidate sets Four ways to construct a candidate gene set: Opt Candidate Set R Commands 1 Neighbors of Training Genes in Gene Network > commandhelp('autohgpec step3_pcg_nbnetwork') [1] "Available arguments for 'autohgpec 21

User must define distance of neighbors to training genes 2 Neighbors Of Training Genes in Chromosome (also known as Artificial Linkage Interval) User must define number of neighbors of each training gene in the same chromosome. 3 All remaining genes in Gene Network 4 Susceptible Chromosome Regions/Bands User selects candidate genes from susceptible chromosome regions/bands. step3_pcg_nbnetwork':" [1] "distance" > commandrun('autohgpec step3_pcg_nbnetwork distance=1') > commandhelp('autohgpec step3_pcg_nbchromosome') [1] "Available arguments for 'autohgpec step3_pcg_nbchromosome':" [1] "distance" "seedgene" > commandrun('autohgpec step3_pcg_nbchromosome distance=99 seedgene="all') > commandhelp('autohgpec step3_pcg_allremaining') > commandrun('autohgpec step3_pcg_allremaining') > commandhelp('autohgpec step3_pcg_suscepchromo') > commandrun('autohgpec step3_pcg_suscepchromo') For this case study, we selected option 3: > commandrun('autohgpec step3_pcg_allremaining') à A total of 10,465 remaining genes were selected as candidate genes Step 4: Prioritize Use command step4_prioritize - List available arguments of command step4_prioritize of autohgpec in R > commandhelp('autohgpec step4_prioritize') [1] "Available arguments for 'autohgpec step4_prioritize':" [1] "backprob" "jumpprob" "subnetweight" - Run the command with parameters to rank all genes and disease phenotypes in the heterogeneous network > commandrun('autohgpec step4_prioritize backprob=0.5 jumpprob=0.6 subnetweight=0.7') Ranked Genes Ranked Diseases 22

Step 5: Examine ranked genes and diseases Visualize - Select ranked candidate genes and diseases in the two above data tables to visualize, then use command step5_2_visualize > commandrun('autohgpec step5_2_visualize') See the results in the section Visualize in Step 5 of Run autohgpec in Cytoscape Search Evidences - Select highly ranked candidate genes and diseases in the two above data tables to search evidences, then use command step5_1_search_evidences > commandrun('autohgpec step5_1_search_evidences') See the results in the section Search Evidences in Step 5 of Run autohgpec in Cytoscape 23

IV. Reference Amberger, J., et al. McKusick's Online Mendelian Inheritance in Man (OMIM ). Nucleic Acids Research 2009;37(suppl 1):D793-D796. Le, D.-H. and Pham, V.-H. HGPEC: a Cytoscape app for prediction of novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network. BMC Systems Biology 2017;11(1):61. Li, Y. and Patra, J.C. Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network. Bioinformatics 2010;26(9):1219-1224. Piñero, J., et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Research 2017;45(D1):D833-D839. van Driel, M.A., et al. A text-mining analysis of the human phenome. Eur J Hum Genet 2006;14(5):535-542. 24