Gene-microRNA network module analysis for ovarian cancer

Similar documents
Identification of Tissue Independent Cancer Driver Genes

Inferring Biological Meaning from Cap Analysis Gene Expression Data

A Statistical Framework for Classification of Tumor Type from microrna Data

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

SUPPLEMENTARY APPENDIX

Systematic exploration of autonomous modules in noisy microrna-target networks for testing the generality of the cerna hypothesis

Outlier Analysis. Lijun Zhang

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

Session 4 Rebecca Poulos

Integrated network analysis reveals distinct regulatory roles of transcription factors and micrornas

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities

Comparison of discrimination methods for the classification of tumors using gene expression data

From reference genes to global mean normalization

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals

Session 4 Rebecca Poulos

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010

Nature Getetics: doi: /ng.3471

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of

SUPPLEMENTARY FIGURES: Supplementary Figure 1

A Versatile Algorithm for Finding Patterns in Large Cancer Cell Line Data Sets

Micro-RNA web tools. Introduction. UBio Training Courses. mirnas, target prediction, biology. Gonzalo

Nature Genetics: doi: /ng Supplementary Figure 1

Heritability enrichment of differentially expressed genes. Hilary Finucane PGC Statistical Analysis Call January 26, 2016

Supplementary Tables. Supplementary Figures

Searching for unusual clusters of palindromes and close inversions in the SARS virus genome. June 4, 2003

Chapter 1. Introduction

High AU content: a signature of upregulated mirna in cardiac diseases

From mirna regulation to mirna - TF co-regulation: computational

Feature Vector Denoising with Prior Network Structures. (with Y. Fan, L. Raphael) NESS 2015, University of Connecticut

SUPPLEMENTAL DATA AGING, July 2014, Vol. 6 No. 7

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA LEO TUNKLE *

What can we contribute to cancer research and treatment from Computer Science or Mathematics? How do we adapt our expertise for them

Comparative Study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL

Nature Methods: doi: /nmeth.3115

Supplemental Information. Integrated Genomic Analysis of the Ubiquitin. Pathway across Cancer Types

Cancer Informatics Lecture

MethylMix An R package for identifying DNA methylation driven genes

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Prioritizing cancer-related key mirna target interactions by integrative genomics

Inferring condition-specific mirna activity from matched mirna and mrna expression data

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

Supplementary Figure 1: LUMP Leukocytes unmethylabon to infer tumor purity

Supporting Information

Identifying Relevant micrornas in Bladder Cancer using Multi-Task Learning

Mature microrna identification via the use of a Naive Bayes classifier

Package TargetScoreData

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

Analysis of paired mirna-mrna microarray expression data using a stepwise multiple linear regression model

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

T. R. Golub, D. K. Slonim & Others 1999

A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data

IPA Advanced Training Course

COMPUTATIONAL OPTIMISATION OF TARGETED DNA SEQUENCING FOR CANCER DETECTION

EXPression ANalyzer and DisplayER

Supplementary Figures

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

The Cancer Genome Atlas & International Cancer Genome Consortium

Obstacles and challenges in the analysis of microrna sequencing data

Principal Component Analysis Based Feature Extraction Approach to Identify Circulating microrna Biomarkers

SUPPLEMENTARY FIGURE LEGENDS

MIR retrotransposon sequences provide insulators to the human genome

IDENTIFICATION OF IN SILICO MIRNAS IN FOUR PLANT SPECIES FROM FABACEAE FAMILY

Supplemental Figure 1. Small RNA size distribution from different soybean tissues.

The Cancer Genome Atlas

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

A novel and universal method for microrna RT-qPCR data normalization

Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance *

microrna-guided diagnostics for neuroendocrine tumors

MOST: detecting cancer differential gene expression

Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser

What Math Can Tell You About Cancer

Prediction of potential disease-associated micrornas based on random walk

User Guide. Association analysis. Input

The Cancer Genome Atlas Pan-cancer analysis Katherine A. Hoadley

Supplementary Figures

Identification of distinct mirna target regulation between breast cancer molecular subtypes using AGO2-PAR-CLIP and patient datasets

Integration of high-throughput biological data

Integrated analysis of mirna/mrna expression and gene methylation using sparse canonical correlation analysis.

SUPPLEMENTAL FILE. mir-22 and mir-29a are members of the androgen receptor cistrome modulating. LAMC1 and Mcl-1 in prostate cancer

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

About diffcoexp. Wenbin Wei, Sandeep Amberkar, Winston Hide

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Profiles of gene expression & diagnosis/prognosis of cancer. MCs in Advanced Genetics Ainoa Planas Riverola

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

The Lens Model and Linear Models of Judgment

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

Transcriptome profiling of the developing male germ line identifies the mir-29 family as a global regulator during meiosis

Supplemental Information. A Highly Sensitive and Robust Method. for Genome-wide 5hmC Profiling. of Rare Cell Populations

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

SubLasso:a feature selection and classification R package with a. fixed feature subset

Introduction. Introduction

Transcription:

Gene-microRNA network module analysis for ovarian cancer Shuqin Zhang School of Mathematical Sciences Fudan University Oct. 4, 2016

Outline Introduction Materials and Methods Results Conclusions

Introduction Networks are widely applied to model different types of complex systems. Many networks have the module structure property. Intuitively, a module is a cohesive group of nodes that connected more densely" to each other than to the nodes in other modules. Modules may correspond to some functional units or play similar roles.

Figure : Human liver cohort(hlc) gene co-expression network X.Yang,B.Zhang, et al. Genome Research, 20,1020-1036,2010

Introduction MicroRNAs (mirnas) are small ( 22 nucleotides) non-coding RNAs that have emerged as key gene regulators. Each mirna is potentially able to regulate around 100 or more mrna targets and over 30% of all human genes are supposed to be regulated by mirnas. Identification and validation of mirna targets may lead to new therapeutic methods. Most of the current methods mainly considered the down-regulatory effects from mirnas.

Introduction One mirna may regulate many mrnas, and be regulated by several mirnas, thus the intertwined relationship between mirnas and mrnas becomes very complex. The systems tools such as networks should be more appropriate for studying the relationships between mirnas and mrnas. A few papers have been published to study the complex interactions between genes and mirnas from the system point of view by using networks, especially the network modules. Published methods: SNMNMF, PIMiM36, Mirsynergy, etc..

Our aim: to find the gene-mirna modules and study the functions and properties. We called it the co-module of genes and mirnas: (1) both genes and mirnas are in the module, (2) all of gene-gene, mirna-mirna, and gene-mirna connections appear in the module.

Data sets Materials and Methods We downloaded the gene expression for ovarian cancer from The Cancer Genome Atlas (TCGA). The gene expression data is generated with UNC AgilentG4502A_07_03. There are a total of 22747 genes with annotations for the 562 samples. We downloaded the mirna expression data for ovarian cancer from TCGA, which is generated with UNC mirna_8x15kv2. There are a total of 590 mirnas for the 595 samples. We selected the first 3200 genes with the largest expression variance greater than 1. We chose both data sets for the common 556 samples. We downloaded the gene-mirna interaction data from mirtarbase (http://mirtarbase.mbc.nctu.edu.tw). There are a total of 39110 gene-mirna interactions.

Network Construction Gene (mirna, gene-mirna) coexpression network: Compute the Pearson correlation coefficient. Construct the adjacency matrix by hard thresholding. If the absolute value of the Pearson correlation coefficient between genes (mirnas, gene-mirnas) is greater than some given value, we assign an edge between them. Choose the threshold: Gene coexpression network and mirna coexpression network: try different thresholds and compute the linear regression coefficient between the log10 transformed degree frequency of degree d (log10 f(d)) and d (log10 d) to make the network has approximately scale free property. Gene-miRNA coexpression network: it depends on the AUCs for the known gene-mirnas being clustered in the same module. Given a fixed threhold, we use our proposed method in the following to get the score of one gene and one mirna in the same module. We choose the threshold that achieves the highest AUC.

Method: Co-module Indentification A gm the gene-mirna correlation adjacency matrix. C is an adjacency matrix with both A gm and the known gene-mirna relations.

Method: Co-module Indentification We let S g be the assignment of the N g genes into K modules for the network G g, S g (i, k) = { 1, if vertex i Vk, 0, otherwise, where i = 1, 2,, N g ; k = 1, 2,, K, V k denotes the k-th module. Similarly, we define the assignment of the N m mirnas into K modules for the network G m as S m.

Method: Within network module identification Taking network G g as an example. We define K S g (., k) T (2A g D g )S g (., k) Ψ g (S g ) = S g (., k) T. (1) S g (., k) k=1 The optimization problem is formulated as: max Ψ g (S g ), s.t. S g (i, k) {0, 1}, K S g (., k) = 1. (2) k=1 By letting S g (., k) = Sg(.,k) S g(.,k) 2, the problem is relaxed to: max Ψ g ( S g ) = Tr( S g T (2Ag D g ) S g ) s.t. SgT Sg = I K.

Method: Co-module Indentification For the co-module, we expect that the genes and mirnas with dense connections are clustered into one module. We maximize to make it. S T g (.,k)cs m(.,k) S g(.,k) 2 S m(.,k) 2 Putting all terms together, our objective becomes: Ψ(S g, S m ) = Ψ g (S g )+Ψ m (S m )+λ K k=1 where λ controls the within and between network connections. The optimization problem is formulated as: max Ψ(S g, S m), s.t. S g(i, k) {0, 1}, S m(j, k) {0, 1}, S T g (., k)cs m (., k) S g T (., k) 2 S m (., k) 2, K S g(, k) = 1 k=1 K S m(, k) = 1 k=1

Method: Co-module Indentification We define L g = 2A g D g, L m = 2A m D m, ( ) ( ) 0 C L w = diag(l g, L m ), L b = C T, 0 S Sg =, and S m L = L w + λl b. The above optimization problem can be relaxed to: max Ψ( S) = Tr( S T L S), s.t. S T S = 2IK. We take S as a data set composed of N g + N m nodes and do k means clustering to get the assignment label for each node.

Algorithm Input: Adjacency matrix A g, A m, C, and K, which is the number of modules. 1. Compute the matrices L g, L m ; 2. Construct the matrix L; 3. Compute the K eigenvectors v 1, v 2,, v K corresponding to the K largest eigenvalues of matrix L; 4. Construct a new matrix T R (Ng+Nm) K, with columns v 1, v 2,, v K ; 5. Cluster the points constructed from each row of matrix T with k-means clustering into K clusters; Output: Index of nodes in each module.

Results: Gene-microRNA Network Module Analysis for Ovarian Cancer Cutoff for building gene/mirna coexpression network: 0.6. Cutoff for building gene-mirna coexpression network: 0.3. λ = 1. Figure : AUCs for diffferent cutoffs. With our method, we finally got 46 modules.

MiRNA module enrichment analysis The mirna cluster data are downloaded from the mirbase website (http://www.mirba se.org/), with the inter-mirna distance cutoff 10kb. This criterion resulted in 153 clusters containing from 2 to 46 mirnas. There are a total of 14 modules enriched by clusters, and 8 clusters enriched by modules with the overlap size between clusters and modules being at least 3. Example, 5 of 12 mirnas in module 35 belong to a cluster with size 6 in Chr13. The total distance of this cluster is about 700bp.

Table : MiRNA module enrichment results. No. p-value MiRNAs Loci 38 2.01E-18 mir-411, mir-299, mir-758, mir-329-1, mir-543, mir-495, Chr14 101022066- mir-654, mir-376b, mir-376a-1, mir-381, mir-487b, mir-539, 101066801 mir-487a, mir-382, mir-154, mir-377, mir-409, mir-369, mir-376c,mir-889,mir-410 45 1.41E-14 mir-379, mir-411, mir-299, mir-758, mir-329-1, mir-543, Chr14 101022066- mir-376c, mir-654, mir-376b, mir-376a-1, mir-381, -101066801 mir-487a, mir-382, mir-154, mir-377, mir-409, mir-369, mir-495, mir-487b, mir-539, mir-410 5 1.29E-15 mir-411, mir-758, mir-329-1, mir-543, mir-495, Chr14 101022066- mir-376b, mir-376a-1, mir-487b mir-539, mir-889, -101066801 mir-382 mir-154, mir-409, mir-369, mir-654,mir-487a, mir-410 2 2.19E-02 mir-379, mir-299, mir-376c, mir-376a-1, mir-381, mir-377 Chr14 101022066- -101062118 10 1.40E-03 mir-379, mir-299, mir-376c, mir-376a-1, mir-381, mir-377 Chr14 101022066- -101062118 38 2.70E-05 mir-493, mir-337, mir-433, mir-127, mir-432, mir-136 Chr14 100869060- -100884783 12 1.40E-03 mir-379, mir-299, mir-376c, mir-376a-1, mir-381, mir-377 Chr14 101022066- -101062118 21 1.26E-02 mir-379, mir-299, mir-376c, mir-376a-1, mir-381, mir-377 Chr14 101022066- -101062118 35 1.17E-06 mir-17, mir-18a, mir-19a, mir-20a, mir-19b-1 Chr13 91350605- -100093643

Gene module enrichment analysis We did enrichment analysis for Gene Ontology biological process (GO-BP) terms and KEGG pathways with DAVID. By taking the cutoff of the Benjamini p-values as 0.05, 15 modules are enriched by GO-BP terms and 7 modules are enriched by KEGG pathways significantly.

Enriched KEGG pathways for the modules Table : Enriched KEGG No. Enriched Pathways p-value 41 Small cell lung cancer 1.30E-03 pathways for the modules. Chronic myeloid leukemia 3.10E-02 Pathways in cancer 2.40E-02 Colorectal cancer 1.90E-02 No. Enriched Pathways p-value Cell cycle 3.40E-02 18 p53 signaling pathway 3.50E-03 Thyroid cancer 1.30E-01 Small cell lung cancer 2.70E-03 Bladder cancer 1.60E-01 Cell cycle 4.00E-03 Endometrial cancer 1.80E-01 Pathways in cancer 2.10E-02 Non-small cell lung cancer 1.60E-01 Non-small cell lung cancer 8.20E-02 Acute myeloid leukemia 1.60E-01 Glioma 8.00E-02 Glioma 1.60E-01 Melanoma 7.70E-02 p53 signaling pathway 1.50E-01 Pancreatic cancer 6.90E-02 Melanoma 1.50E-01 Chronic myeloid leukemia 6.40E-02 Pancreatic cancer 1.40E-01 Prostate cancer 6.80E-02 Prostate cancer 1.60E-01

Gene-miRNA modules are strongly associated with cancers We checked the cancer related mirnas from the website: http://mircancer. ecu.edu. There are 295 different mirnas related to cancer, of which 122 are in our identified modules. 57 of the 295 mirnas are related to ovarian cancer, of which 29 are in our identified modules, which achieves a p-value 0.0386. Table : Number of cancer associated mirnas for the modules enriched by clusters. Module No. 1 2 5 10 11 12 18 No. of mirnas 5 13 21 9 6 9 12 No. of c-mirnas 5 9 8 6 5 6 10 p-value 0 1.80E-04 4.82E-02 3.40E-03 1.77E-03 3.40E-03 4.74E-06 Module No. 21 22 35 37 38 41 45 No. of mirnas 11 11 12 12 29 11 39 No. of c-mirnas 7 2 10 12 13 10 15 p-value 2.17E-03 7.00E-01 4.74E-06 0 2.29E-03 9.58E-07 6.43E-03

In module 1, all the mirnas are associated with ovarian cancer. The genes in this module take part in the process of transcription, gene expression etc.. In module 41, 5 mirnas are associated with ovarian cancer. By checking the GO-BP terms, we found that the most enriched term is sexual reproduction, which has a p-value 7.50E-06. This module also enriches the GO-term: gamete generation, male gamete generation, and spermatogenesis significantly.

(a) Figure : Module 37: 156 genes are regulated by the 12 mirnas. There are a total of 47 known regulations. (b)

Comparison with Mirsynergy Mirsynergy operates in two steps: It detects the mirna modules based on gene-mirna relationship. It expands each mirna module by greedily including (excluding) mrnas into (from) the mirna module to maximize the synergy score, which is a function of mirna-mrna and gene-gene interactions. Comparision with Mirsynergy: Table : Module enrichment performance of Mirsynergy and our method. Method N module Ng Nm N en module N en cluster N GO N KEGG Mirsynergy 18 88.4 21.9 1 1 4 1 Our method 46 42.6 10.3 14 8 15 7

Conclusions We proposed an optimization model to study the gene and mirna co-modules in networks. We applied this model to an ovarian cancer data set. 14 modules are enriched by the mirna clusters with overlap size being at least 3, 15 modules are enriched by GO-BP terms, and 7 modules are enriched by KEGG pathways significantly. In the identified modules, 122 mirnas are cancer associated and 29 mirnas are related to ovarian cancer, which has a p-value 0.039. Compared to the existing method Mirsynergy, our method can find more modules with the number of genes and mirnas having a good balance. The models and algorithms can be extended to analyze more complex networks.

Thank you!