How To Use SubpathwayGMir Li Feng, Chunquan Li and Xia Li May 20, 2015 Contents 1 Overview 1 2 The experimentally verified mirna-target interactions 2 3 Reconstruct KEGG metabolic pathways 2 3.1 Embed mirnas to direct KEGG metabolic pathway graphs................ 2 3.2 Embed mirnas to undirect KEGG metabolic pathway graphs.............. 3 4 Locate KEGG metabolic subpathways 3 4.1 Based on reconstructed direct KEGG metabolic pathways................. 3 4.2 Based on reconstructed undirect KEGG metabolic pathways................ 7 5 Identify the significantly enriched subpathways 7 5.1 Based on located direct KEGG metabolic subpathways................... 7 5.2 Based on located undirect KEGG metabolic subpathways................. 7 6 Get the current environment variables 9 7 Update the organism-specific environment variables 9 8 Session Info 10 1 Overview This vignette demonstrates how to easily use the SubpathwayGMir package. This package can implement the identification of Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic subpathways mediated by micrornas (mirnas), by topologically locating mirnas and genes within reconstructed KEGG metabolic pathway graphs, which emmbedded by mirnas through integrating mirna-target interactions verified by low-throughput experiments. (1) This package provides the GetK2riData to return verified mirna-target interactions, which collected from four databases, such as TarBase(v5.0), mirecords(v4.0), mir2disease and mirtarbase. (see the section 2). (2) This package provides the getintegraphlist function to reconstruct KEGG metabolic pathways by embedding mirnas into direct and/or undirect KEGG metabolic pathway graphs, these embedded mirnas have verified targets within pathways.(see the section 3). (3) This package provides the getlocsubgraph function to locate mirna-mediated metabolic subpathways by topologically analyzing the lenient distance of mirnas and genes, based on reconstructed pathways.(see the section 4). (4) This package provides the identifygraph function to identify the significantly enriched metabolic subpathways, based on located subpathways.(see the section 5). (5) This package provides the GetK2riData function to get variable data 1
in current environment.(see the section 6). (6) This package provides the updateorgenvir function to updata the organism-specific environment variables.(see the section 7). 2 The experimentally verified mirna-target interactions We can use function GetK2riData to return verified mirna-gene interactions, which are collected from four databases, namely TarBase(v5.0), mirecords(v4.0), mir2disease and mirtarbase. We process these datasets into a uniform format. The final dataset contains seven columns, such as SourceDB, Species, mirna, target, LowHTExps, Experiments and PMID. The value of column SourceDB is one of TarBase(v5.0), mirecords(v4.0), mir2disease and mirtarbase, which represents where this relation pair was derived from. Besides, this dataset can supports six organisms, such as cel(caenorhabditis elegans), dre(danio rerio), dme(drosophila melanogaster), hsa(homo sapiens), mmu(mus musculus) and rno(rattus norvegicus). The column Experiments describes what kind of experiments valided this relations and the column LowTHExps represents whether this relation have been valided by low-throughput experiments or not. The column PMID provides PubMed identifers for the references of relations. > # get verified mirna-target interactions > expmir2tar <- GetK2riData("expMir2Tar") > # view first six rows of data > expmir2tar[1:6,] SourceDB mirna Gene Species LowTHExps 279 mirecords cel-mir-273 die-1 cel YES 286 mirtarbase cel-mir-35-3p lin-23 cel YES 287 mirtarbase cel-mir-35-3p gld-1 cel YES 303 mirtarbase cel-mir-84-5p let-60 cel YES 304 mirtarbase cel-lin-4-5p lin-14 cel YES 305 mirtarbase cel-lin-4-5p lin-14 cel YES Experiments PMID 279 GFP Activity Assay 15306811 286 Luciferase reporter assay//qrt-pcr 21691303 287 Luciferase reporter assay//qrt-pcr 21691303 303 GFP reporter assay 15766527 304 Immunofluorescence//LacZ reporter assay 8252622 305 qrt-pcr//western blot//northern blot 19155321 3 Reconstruct KEGG metabolic pathways We can use function getintegraphlist to return the integrated KEGG metabolic pathway graph list. We first convert KEGG metabolic pathways to direct/undirect graphs with genes as nodes, then reconstructed pathways by linking mirnas to targets within it. 3.1 Embed mirnas to direct KEGG metabolic pathway graphs The function getintegraphlist can integrate mirnas to direct KEGG metabolic pathway graphs. With integrated graph list, we can offer the additional interested mirnas and/or genes sets to identify the condition-specific metabolic pathways mediated by mirnas. > # get hsa-specificd mirna-target interactions > expmir2tar <- GetK2riData("expMir2Tar") 2
> row1 <- which(expmir2tar[["lowthexps"]]=="yes") > row2 <- which(expmir2tar[["species"]]=="hsa") > relations <- unique(expmir2tar[intersect(row1,row2),c(2:3)]) > # get direct metabolic pathway graphs > DirectGraphList <- GetK2riData("MetabolicGEGEEMGraph") > # get reconstructed direct pathway graph list > DirectInteGraphList <- getintegraphlist(directgraphlist, relations) The following commands can show the embedded pathways with genes and mirnas as nodes. > # visualize the reconstructed direct pathway > plotgraph(directintegraphlist[[1]],layout=layout.random) Figure 1 shows the reconstructed direct Glycolysis / Gluconeogenesis metabolic pathway. 3.2 Embed mirnas to undirect KEGG metabolic pathway graphs The function getintegraphlist can integrate mirnas into undirect KEGG metabolic pathway graphs with genes as nodes. With integrated graph list, we can offer the additional interested mirnas and/or genes sets to identify the condition-specific pathways mediated by mirnas. > # get undirect metabolic pathway graphs > UndirectGraphList <- GetK2riData("MetabolicGEGEUEMGraph") > # get reconstructed undirect pathway graph list > UndirectInteGraphList <- getintegraphlist(undirectgraphlist, relations) The following commands can show the reconstructed pathway graph with genes and mirnas as nodes. > # visualize the reconstructed undirect pathway > plotgraph(undirectintegraphlist[[1]],layout=layout.random) Figure 2 shows the reconstructed undirect Glycolysis / Gluconeogenesis metabolic pathway. 4 Locate KEGG metabolic subpathways We can use function getlocsubgraph to locate metabolic subpathways by topologically analyzing the lenient distance of mirnas and/or genes based on reconstructed pathways. 4.1 Based on reconstructed direct KEGG metabolic pathways The function getlocsubgraph can locate metabolic subpathways based on reconstructed direct KEGG metabolic pathways. > # get user-interested mirnas and genes > moleculelist <- c(getbackground(type="gene")[1:1000], + getbackground(type="mirna")[1:2000]) > # get located direct subpathways > DirectSubGraphList <- getlocsubgraph(moleculelist,directintegraphlist, + type="gene_mirna",n=1,s=10) The following commands can show the located subpathway graph with genes and mirnas as nodes. > # visualize the located direct pathway > plotgraph(directsubgraphlist[[1]],layout=layout.random) Figure 3 shows the located direct purine metabolic subpathway. 3
LDHAL6B LDHA AKR1A1 hsa mir 133a 3p hsa mir 1 ALDH1B1 ALDOA FBP1 hsa mir 520e GPI ALDOC PKM ALDH9A1 hsa mir 145 5p ALDOB BPGM ENO3 DLD PGAM4 ADH1B PGAM2 ADH1A PGK1 hsa mir 520a 3p hsa mir 143 3p hsa mir 133b PGM2 hsa mir 125a 5p ADPGK G6PC3 DLAT hsa mir 124 3p PDHB ALDH1A3 hsa mir 326 ADH7 PKLR PCK1 PFKL PCK2 hsa mir 21 5p ENO2 hsa mir 30a 5p FBP2 ALDH3A2 HK1 ALDH7A1 hsa mir 155 5p ALDH2 ACSS2 hsa let 7b 5p ADH6 LDHB hsa mir 23a 3pPGK2 hsa mir 29c 3p PDHA2 ADH1C LDHAL6A GALM HKDC1 hsa mir 122 5p ENO1 TPI1 ALDH3A1 PFKP ACSS1 hsa mir 15a 5p GAPDH GAPDHS hsa mir 520b ADH4 PFKM G6PC PGM1 G6PC2 hsa mir 34a 5p ALDH3B1 ALDH3B2 HK3 hsa mir 16 5p hsa mir 125b 5p GCK PDHA1 PGAM1 HK2 hsa mir 644a LDHC ADH5 Figure 1: The visualization of reconstructed direct Glycolysis / Gluconeogenesis metabolic pathway. 4
ALDH3A1 hsa mir 143 3p ENO2 PFKL ADH1A G6PC3 ENO3 ALDOC G6PC hsa mir 15a 5p G6PC2 hsa mir 644a PGK1 hsa mir 16 5p PGAM4 GCK ALDH3B1 PGM1 hsa mir 145 5p AKR1A1 hsa mir 21 5p PGM2 hsa mir 326 hsa mir 30a 5p PDHA1 HKDC1 hsa mir 520e hsa let 7b 5p LDHAL6B PFKM ALDH1B1 DLAT HK1 GAPDH PGAM2 LDHB PKM GPI ADH7 LDHC hsa mir 29c 3p ADH4 hsa mir 122 5p ENO1 PGK2 hsa mir 133a 3p ADH1C hsa mir 124 3p GAPDHS ALDH9A1 ADH5 ADH6 hsa mir 520a 3p hsa mir 34a 5p ACSS2 LDHAL6A hsa mir 1 LDHA PDHA2 PCK2 PCK1 PFKP GALM hsa mir 23a 3p TPI1 HK3 ADH1B ACSS1 ALDH2 BPGM hsa mir 133b hsa mir 125b 5p ALDOA ALDH3B2 DLD ALDH3A2 hsa mir 125a 5p ALDH7A1 hsa mir 520b HK2 PGAM1 FBP1 PKLR PDHB hsa mir 155 5p ADPGK ALDH1A3 FBP2 ALDOB Figure 2: The visualization of reconstructed undirect Glycolysis / Gluconeogenesis metabolic pathway. 5
NT5C2 hsa mir 124 3p hsa mir 16 5p ADA hsa mir 1 NT5E hsa mir 30a 5p hsa mir 133a 3p NT5C3A hsa mir 34a 5p hsa mir 122 5p PNP hsa mir 155 5p Figure 3: The visualization of located direct purine metabolic subpathway. 6
4.2 Based on reconstructed undirect KEGG metabolic pathways The function getlocsubgraph can locate subpathways based on reconstructed undirect pathways. > # get located undirect subpathways > UnDirectSubGraphList <- getlocsubgraph(moleculelist,undirectintegraphlist, + type="gene_mirna",n=1,s=10) The following commands can show the located subpathway graph with genes and mirnas as nodes. > # visualize the located undirect pathway > plotgraph(undirectsubgraphlist[[6]],layout=layout.random) Figure 4 shows the located undirect purine metabolic subpathway. 5 Identify the significantly enriched subpathways We can use function identifygraph to identify the significantly enriched subpathways based on located direct/undirect metabolic subpathways. 5.1 Based on located direct KEGG metabolic subpathways The function identifygraph can identify the significantly enriched subpathways based on located direct metabolic subpathways. > # identify significant direct subpathways > ann <- identifygraph(moleculelist,directsubgraphlist,type="gene_mirna") > result <- printgraph(ann,detail=true) > # view the result > head(result[,c(1:2,5:6)]) pathwayid pathwayname pvalue 1 path:00230_1 Purine metabolism 1.992183e-07 2 path:00520_1 Amino sugar and nucleotide sugar metabolism 1.992183e-07 fdr 1 1.992183e-07 2 1.992183e-07 5.2 Based on located undirect KEGG metabolic subpathways The function getlocsubgraph can identify the significantly enriched subpathways based on located undirect metabolic subpathways. > # identify significant undirect subpathways > ann <- identifygraph(moleculelist,undirectsubgraphlist,type="gene_mirna") > result <- printgraph(ann,detail=true) > # view the result > head(result[,c(1:2,5:6)]) pathwayid pathwayname pvalue fdr 1 path:00562_2 Inositol phosphate metabolism 0.000000e+00 0.000000e+00 2 path:00230_1 Purine metabolism 9.683032e-12 1.113549e-10 3 path:00270_2 Cysteine and methionine metabolism 9.957679e-11 7.634221e-10 7
hsa let 7c 5p PNP RRM2 hsa mir 24 3p AK2 NT5C3A POLD1 PKM NT5C2 hsa mir 122 5p hsa mir 124 3p PRPS1 hsa mir 29a 3p hsa mir 326 POLR2L hsa mir 30a 5p hsa mir 34a 5p POLR2B hsa mir 376a 5p IMPDH2 hsa let 7c 3p NT5E hsa mir 1 hsa mir 133a 3p hsa let 7b 5p IMPDH1 hsa mir 16 5p hsa mir 133b hsa mir 29b 3p hsa mir 155 5p ADA Figure 4: The visualization of located undirect purine metabolic subpathway. 8
4 path:00010_3 Glycolysis / Gluconeogenesis 2.587495e-09 1.487810e-08 5 path:00480_1 Glutathione metabolism 7.775946e-09 3.576935e-08 6 path:00240_1 Pyrimidine metabolism 1.223046e-08 4.688342e-08 > # save the result > write.table(head(result),"result.txt",sep="\t",col.names=true,row.names=false) 6 Get the current environment variables We can use function GetK2riData to obtain variable datas in current environment. > # get verified mirna-target interactions > expmir2tar <- GetK2riData(K2riData="expMir2Tar") > # get the background of mirnas > BGMiRNA <- GetK2riData(K2riData="BGMiRNA") > # get the background of genes > BGGene <- GetK2riData(K2riData="BGGene") > 7 Update the organism-specific environment variables We can use function updateorgenvir to update the organism-specific environment variables. > # update the cel-specific environment variables > updateorgenvir("cel") [1] "Update the current organism : cel" [1] "Note that the programming may be time consumming!" [1] "Download relations between gene and symbol." [1] "Download relations between KEGG gene and pathway" [1] "Download background of mirnas" [1] "Download background of direct KEGG metabolic pathways" [1] "Download background of undirect KEGG metabolic pathways" > # show the current environment variables > ls(k2ri) [1] "BGGene" "BGMiRNA" [3] "CEL_MetabolicGEGEEMGraph" "CEL_MetabolicGEGEUEMGraph" [5] "DME_MetabolicGEGEEMGraph" "DME_MetabolicGEGEUEMGraph" [7] "DRE_MetabolicGEGEEMGraph" "DRE_MetabolicGEGEUEMGraph" [9] "HSA_MetabolicGEGEEMGraph" "HSA_MetabolicGEGEUEMGraph" [11] "MMU_MetabolicGEGEEMGraph" "MMU_MetabolicGEGEUEMGraph" [13] "MetabolicGEGEEMGraph" "MetabolicGEGEUEMGraph" [15] "RNO_MetabolicGEGEEMGraph" "RNO_MetabolicGEGEUEMGraph" [17] "expmir2tar" "gene2path" [19] "gene2symbol" "mirna2org" > # show the background of mirnas > k2ri$bgmirna[1:3] [1] "cel-let-7-5p" "cel-let-7-3p" "cel-lin-4-5p" > 9
8 Session Info The script runs within the following session: R version 3.0.2 (2013-09-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=C [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grdevices utils datasets methods base other attached packages: [1] SubpathwayGMir_1.0 igraph_0.7.1 XML_3.98-1.1 loaded via a namespace (and not attached): [1] tools_3.0.2 References [Antonov et al., 2008] Antonov, A.V., et al. (2008) Kegg Spider: Interpretation of Genomics Data in the Context of the Global Gene Metabolic Network. Genome Biol, 9, R179. [Csardi and Nepusz, 2006] Csardi, G. and Nepusz, T. (2006) The igraph software package for complex network research. InterJournal, Complex Systems, 1695. [Draghici et al., 2007] Draghici, S., et al. (2007) A Systems Biology Approach for Pathway Level Analysis. Genome Res, 17, 1537-1545. [Guimera and Nunes Amaral, 2005] Guimera, R. and Nunes Amaral, L.A. (2005) Functional Cartography of Complex Metabolic Networks. Nature, 433, 895-900. [Huber et al., 2007] Huber, W., et al. (2007) Graphs in Molecular Biology. BMC Bioinformatics, 8 Suppl 6, S8. [Hung et al., 2010] Hung, J.H., et al. (2010) Identification of Functional Modules That Correlate with Phenotypic Difference: The Influence of Network Topology. Genome Biol, 11, R23. [Kanehisa et al., 2006] Kanehisa, M., et al. (2006) From Genomics to Chemical Genomics: New Developments in Kegg. Nucleic Acids Res, 34, D354-357. [Koyuturk et al., 2004] Koyuturk, M., et al. (2004) An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks. Bioinformatics, 20 Suppl 1, i200-207. [Li et al., 2009] Li, C., et al. (2009) Subpathwayminer: A Software Package for Flexible Identification of Pathways. Nucleic Acids Res, 37, e131. 10
[Li et al., 2013] Li, C., et al. (2013) Subpathway-GM: identification of metabolic subpathways via joint power of interesting genes and metabolites and their topologies within pathways. Nucleic acids research, 41, e101. [Smart et al., 2008] Smart, A.G., et al. (2008) Cascading Failure and Robustness in Metabolic Networks. Proc Natl Acad Sci U S A, 105, 13223-13228. [Strimmer, 2008] Strimmer, K. (2008) fdrtool: a versatile R package for estimating local and tail areabased false discovery rates. Bioinformatics, 24, 1461-1462. [Vergoulis et al., 2012] Vergoulis, T., et al. (2012) TarBase 6.0: capturing the exponential growth of mirna targets with experimental support. Nucleic acids research, 40, D222-229. [Xiao et al., 2009] Xiao, F., et al. (2009) mirecords: an integrated resource for microrna-target interactions. Nucleic acids research, 37, D105-110. [Hsu et al., 2011] Hsu, S.D., et al. (2011) mirtarbase: a database curates experimentally validated microrna-target interactions. Nucleic acids research, 39, D163-169. [Jiang et al., 2009] Jiang, Q., et al. (2009) mir2disease: a manually curated database for microrna deregulation in human disease. Nucleic acids research, 37, D98-104. 11