Supplementary information for: Community detection for networks with unipartite and bipartite structure Chang Chang 1, 2, Chao Tang 2 1 School of Life Sciences, Peking University, Beiing 100871, China 2 Center for Quantitative Biology and Peking-Tsinghua Center for Life Sciences, Peking University, Beiing 100871, China Email: chang.connected@pku.edu.cn and tangc@pku.edu.cn 1
CONTENTS I. RESULTS OF TWO CASE STUDIES OF THE GENE ONTOLOGY ENRICHMENT ANALYSIS MENTIONED IN THE MAIN TEXT... 3 II. MODEL SPECIFIC FOR A MIXTURE, UNDIRECTED NETWORK... 19 2
I. RESULTS OF TWO CASE STUDIES OF THE GENE ONTOLOGY ENRICHMENT ANALYSIS MENTIONED IN THE MAIN TEXT Supplementary figure 1. The cell cycle and the metabolic -related module. 3
Supplementary table 1. The enriched gene ontology terms for downstream target genes in the cell cycle and the metabolic -related module. GOID TERM FALSE DISCOVERY RATE NUM_ANNOTATION S GO:0006323 DNA packaging 0 21 (of 59) GO:0006334 nucleosome assembly 0 19 (of 50) GO:0065004 protein-dna complex assembly 0 19 (of 51) GO:0071103 DNA conformation change 0 24 (of 75) GO:0031497 chromatin assembly 0 19 (of 52) regulation of GO:0019219 nucleobase-containing compound metabolic 0 128 (of 767) GO:0006333 chromatin assembly or disassembly 0 20 (of 58) GO:0034728 nucleosome organization 0 20 (of 58) GO:0071824 protein-dna complex subunit organization 0 20 (of 58) GO:0051171 regulation of nitrogen compound metabolic 0 130 (of 790) 4
GO:0006259 DNA metabolic 0 53 (of 249) GO:1901360 organic cyclic compound metabolic 0.005 180 (of 1198) GO:0046483 heterocycle metabolic 0.006154 176 (of 1169) GO:0034641 cellular nitrogen compound metabolic 0.005714 179 (of 1194) nucleobase-containing GO:0006139 compound metabolic 0.005333 172 (of 1139) GO:0006325 chromatin organization 0.005 36 (of 154) GO:0006355 regulation of transcription, DNA-dependent 0.004706 109 (of 659) GO:0051252 regulation of RNA metabolic 0.004444 112 (of 683) GO:2001141 regulation of RNA biosynthetic 0.004211 109 (of 663) GO:0090304 nucleic acid metabolic 0.004 156 (of 1028) GO:0006725 cellular aromatic compound metabolic 0.00381 173 (of 1166) 5
regulation of GO:0060255 macromolecule metabolic 0.005455 142 (of 934) GO:0051276 chromosome organization 0.006087 41 (of 200) GO:0006807 nitrogen compound metabolic 0.014167 181 (of 1250) regulation of cellular GO:2000112 macromolecule biosynthetic 0.0176 113 (of 723) GO:0010468 regulation of gene expression 0.017692 118 (of 765) GO:0031326 regulation of cellular biosynthetic 0.017037 115 (of 742) regulation of GO:0010556 macromolecule biosynthetic 0.019286 113 (of 729) GO:0051325 interphase 0.02 24 (of 101) GO:0009889 regulation of biosynthetic 0.019333 115 (of 746) GO:0006996 organelle organization 0.01871 88 (of 542) GO:0007049 cell cycle 0.0225 58 (of 326) GO:0080090 regulation of primary 0.022424 142 (of 959) 6
metabolic GO:0031323 regulation of cellular metabolic 0.021765 144 (of 975) GO:0051329 interphase of mitotic cell cycle 0.032571 23 (of 99) GO:0006351 transcription, DNA-dependent 0.040556 90 (of 568) GO:0000086 G2/M transition of mitotic cell cycle 0.044324 12 (of 39) GO:0019222 regulation of metabolic 0.047895 150 (of 1041) GO:0050794 regulation of cellular 0.048205 207 (of 1507) 7
Supplementary figure 2. The immune system-related module. 8
Supplementary table 2. The enriched gene ontology terms for downstream target genes in the immune system-related module. GOID TERM FALSE DISCOVE RY RATE NUM_ANNOT ATIONS GO:0045087 innate immune response 0 37 (of 86) GO:0006952 defense response 0 43 (of 131) GO:0051607 defense response to virus 0 29 (of 47) GO:0006955 immune response 0 40 (of 123) GO:0002252 immune effector 0 31 (of 68) GO:0009615 response to virus 0 30 (of 63) GO:0009607 response to biotic stimulus 0 35 (of 107) GO:0034340 response to type I interferon 0 21 (of 27) GO:0060337 type I interferon-mediated signaling pathway 0 21 (of 27) GO:0071357 cellular response to type I interferon 0 21 (of 27) GO:0051707 response to other organism 0 34 (of 100) GO:0034097 response to cytokine stimulus 0 32 (of 99) GO:0002376 immune system 0 49 (of 287) GO:0071345 cellular response to cytokine stimulus 0 29 (of 81) GO:0019221 cytokine-mediated signaling pathway 0 26 (of 70) GO:0034341 response to interferon-gamma 0 15 (of 20) GO:0051704 multi-organism 0 43 (of 329) 9
GO:0045071 negative regulation of viral genome replication 0 11 (of 13) GO:0048525 negative regulation of viral reproduction 0 11 (of 13) GO:0006950 response to stress 0 54 (of 550) GO:0035456 response to interferon-beta 0 9 (of 9) GO:0071310 cellular response to organic substance 0 34 (of 237) GO:0045069 regulation of viral genome replication 0 12 (of 19) GO:0010033 response to organic substance 0 41 (of 351) GO:0043901 negative regulation of multi-organism 0 11 (of 16) GO:0060333 interferon-gamma-mediated signaling pathway 0 10 (of 13) GO:2000242 negative regulation of reproductive 0 12 (of 22) GO:0071346 cellular response to interferon-gamma 0 10 (of 14) GO:0043900 regulation of multi-organism 0 17 (of 57) GO:0002682 regulation of immune system 0 24 (of 131) GO:0050776 regulation of immune response 0 18 (of 68) GO:0070887 cellular response to chemical stimulus 0 34 (of 287) GO:0031347 regulation of defense response 0 18 (of 75) GO:0042221 response to chemical stimulus 0 43 (of 461) GO:0050896 response to stimulus 0 70 (of 1071) GO:0050792 regulation of viral reproduction 0 13 (of 39) 10
GO:0032479 regulation of type I interferon production 0 10 (of 21) GO:0035455 response to interferon-alpha 0 7 (of 8) GO:0002697 regulation of immune effector 0 12 (of 35) GO:0019048 virus-host interaction 0 18 (of 100) GO:0080134 regulation of response to stress 0 20 (of 133) GO:0045088 regulation of innate immune response 0 12 (of 44) GO:2000241 regulation of reproductive 0 13 (of 54) GO:0051701 interaction with host 0 18 (of 111) GO:0044403 symbiosis, encompassing mutualism through parasitism 0 18 (of 114) GO:0044419 interspecies interaction between organisms 0 18 (of 114) GO:0007166 cell surface receptor signaling pathway 0 29 (of 296) GO:0022415 viral reproductive 0 20 (of 151) GO:0032481 positive regulation of type I interferon production 0 6 (of 9) GO:0044703 multi-organism reproductive 0 21 (of 171) GO:0001817 regulation of cytokine production 0 12 (of 54) GO:0048519 negative regulation of biological 0 42 (of 590) GO:0051716 cellular response to stimulus 0 51 (of 817) GO:0032728 positive regulation of interferon-beta production 0 5 (of 7) GO:0007165 signal transduction 0 41 (of 588) 11
GO:0035458 cellular response to interferon-beta 0 4 (of 4) GO:0048583 regulation of response to stimulus 0 30 (of 368) GO:0032480 negative regulation of type I interferon production 0 6 (of 13) GO:0043331 response to dsrna 0 6 (of 14) GO:0016032 viral reproduction 0 21 (of 211) GO:0032648 regulation of interferon-beta production 0 5 (of 9) GO:0023052 signaling 0 42 (of 656) GO:0044700 single organism signaling 0 42 (of 656) GO:0043903 regulation of symbiosis, encompassing mutualism through parasitism 0 4 (of 5) GO:0001819 positive regulation of cytokine production 0 7 (of 23) GO:0051239 regulation of multicellular organismal 0 22 (of 244) GO:0001818 negative regulation of cytokine production 0 7 (of 26) GO:0007154 cell communication 0 42 (of 683) GO:0050688 regulation of defense response to virus 0 6 (of 18) GO:0002831 regulation of response to biotic stimulus 0 6 (of 20) GO:0050789 regulation of biological 0 74 (of 1562) GO:0006858 extracellular transport 0.000278 3 (of 3) GO:0019060 intracellular transport of viral proteins in host cell 0.000274 3 (of 3) 12
GO:0030581 symbiont intracellular protein transport in host 0.00027 3 (of 3) GO:0032647 regulation of interferon-alpha production 0.000267 3 (of 3) GO:0032727 positive regulation of interferon-alpha production 0.000263 3 (of 3) regulation of viral-induced cytoplasmic GO:0039531 pattern recognition receptor signaling 0.00026 3 (of 3) pathway GO:0039535 regulation of RIG-I signaling pathway 0.000256 3 (of 3) GO:0046596 regulation of viral entry into host cell 0.000253 3 (of 3) GO:0046597 negative regulation of viral entry into host cell 0.00025 3 (of 3) GO:0046719 regulation of viral protein levels in host cell 0.000247 3 (of 3) GO:0051708 intracellular protein transport in other organism involved in symbiotic interaction 0.000244 3 (of 3) GO:1900246 positive regulation of RIG-I signaling pathway 0.000241 3 (of 3) GO:0043330 response to exogenous dsrna 0.000238 4 (of 7) GO:0002819 regulation of adaptive immune response 0.000235 5 (of 13) GO:0065007 biological regulation 0.000465 76 (of 1642) GO:0002684 positive regulation of immune system 0.00046 10 (of 67) GO:0060759 regulation of response to cytokine stimulus 0.000455 6 (of 22) 13
GO:0051240 positive regulation of multicellular organismal 0.000449 9 (of 55) GO:0050691 regulation of defense response to virus by host 0.000444 4 (of 8) GO:0070206 protein trimerization 0.00044 4 (of 8) GO:0030522 intracellular receptor mediated signaling pathway 0.000435 7 (of 33) GO:0051241 negative regulation of multicellular organismal 0.000645 8 (of 47) GO:0002683 negative regulation of immune system 0.000638 6 (of 25) GO:0060330 regulation of response to interferon-gamma 0.001053 4 (of 9) GO:0071359 cellular response to dsrna 0.001042 4 (of 9) GO:0048518 positive regulation of biological 0.001237 37 (of 624) GO:0001910 regulation of leukocyte mediated cytotoxicity 0.001224 3 (of 4) GO:0002711 positive regulation of T cell mediated immunity 0.001212 3 (of 4) positive regulation of adaptive immune GO:0002824 response based on somatic recombination of immune receptors built from 0.0012 3 (of 4) immunoglobulin superfamily domains GO:0007259 JAK-STAT cascade 0.001386 4 (of 10) 14
GO:0060338 regulation of type I interferon-mediated signaling pathway 0.001373 4 (of 10) GO:0048584 positive regulation of response to stimulus 0.001553 14 (of 145) GO:0050794 regulation of cellular 0.001731 69 (of 1507) GO:0045321 leukocyte activation 0.001714 8 (of 53) GO:0022414 reproductive 0.002264 22 (of 305) GO:0002703 regulation of leukocyte mediated immunity 0.002617 4 (of 11) GO:0000003 reproduction 0.002593 22 (of 306) GO:0002705 positive regulation of leukocyte mediated immunity 0.003486 3 (of 5) GO:0002708 positive regulation of lymphocyte mediated immunity 0.003455 3 (of 5) GO:0002709 regulation of T cell mediated immunity 0.003423 3 (of 5) GO:0031341 regulation of cell killing 0.003393 3 (of 5) GO:0048384 retinoic acid receptor signaling pathway 0.003363 3 (of 5) GO:0001959 regulation of cytokine-mediated signaling pathway 0.004035 5 (of 20) GO:0002698 negative regulation of immune effector 0.004696 4 (of 12) GO:0034612 response to tumor necrosis factor 0.004655 4 (of 12) GO:0040012 regulation of locomotion 0.005128 8 (of 58) GO:0040013 negative regulation of locomotion 0.005424 5 (of 22) 15
GO:0014070 response to organic cyclic compound 0.006218 10 (of 90) GO:0002821 positive regulation of adaptive immune response 0.007 3 (of 6) GO:0050778 positive regulation of immune response 0.007273 6 (of 34) GO:0009966 regulation of signal transduction 0.007705 21 (of 304) GO:0033993 response to lipid 0.009268 9 (of 79) GO:0044699 single-organism 0.011613 55 (of 1169) GO:0001914 regulation of T cell mediated cytotoxicity 0.01504 2 (of 2) GO:0001916 positive regulation of T cell mediated cytotoxicity 0.014921 2 (of 2) GO:0010743 regulation of macrophage derived foam cell differentiation 0.014803 2 (of 2) GO:0032020 ISG15-protein conugation 0.014688 2 (of 2) GO:0034124 regulation of MyD88-dependent toll-like receptor signaling pathway 0.014574 2 (of 2) GO:0035457 cellular response to interferon-alpha 0.014462 2 (of 2) GO:0039528 cytoplasmic pattern recognition receptor signaling pathway in response to virus 0.014351 2 (of 2) GO:0039533 regulation of MDA-5 signaling pathway 0.014242 2 (of 2) GO:0045343 regulation of MHC class I biosynthetic 0.014135 2 (of 2) GO:0046967 cytosol to ER transport 0.01403 2 (of 2) 16
GO:0071360 cellular response to exogenous dsrna 0.013926 2 (of 2) GO:1900245 positive regulation of MDA-5 signaling pathway 0.013824 2 (of 2) GO:0002706 regulation of lymphocyte mediated immunity 0.014161 3 (of 7) GO:0034121 regulation of toll-like receptor signaling pathway 0.014058 3 (of 7) GO:0001775 cell activation 0.014245 9 (of 81) GO:0023051 regulation of signaling 0.014571 22 (of 339) GO:0071396 cellular response to lipid 0.015035 5 (of 26) GO:0071407 cellular response to organic cyclic compound 0.01493 5 (of 26) GO:0010646 regulation of cell communication 0.015245 22 (of 342) regulation of adaptive immune response GO:0002822 based on somatic recombination of immune receptors built from immunoglobulin 0.017778 3 (of 8) superfamily domains GO:0060334 regulation of interferon-gamma-mediated signaling pathway 0.017655 3 (of 8) GO:0031348 negative regulation of defense response 0.022466 4 (of 18) GO:0060416 response to growth hormone stimulus 0.02449 3 (of 9) GO:0006606 protein import into nucleus 0.024595 5 (of 30) GO:0044744 protein targeting to nucleus 0.02443 5 (of 30) GO:0051270 regulation of cellular component movement 0.024933 7 (of 59) 17
GO:0051170 nuclear import 0.025828 5 (of 31) GO:0001912 positive regulation of leukocyte mediated cytotoxicity 0.035263 2 (of 3) GO:0002483 antigen ing and presentation of endogenous peptide antigen 0.035033 2 (of 3) GO:0016556 mrna modification 0.034805 2 (of 3) GO:0019883 antigen ing and presentation of endogenous antigen 0.034581 2 (of 3) GO:0019885 antigen ing and presentation of endogenous peptide antigen via MHC class I 0.034359 2 (of 3) GO:0060700 regulation of ribonuclease activity 0.03414 2 (of 3) GO:0071356 cellular response to tumor necrosis factor 0.037722 3 (of 10) GO:0051051 negative regulation of transport 0.038868 6 (of 47) GO:0051271 negative regulation of cellular component movement 0.040125 4 (of 21) GO:0071216 cellular response to biotic stimulus 0.039876 4 (of 21) GO:0010629 negative regulation of gene expression 0.040864 14 (of 197) GO:0032879 regulation of localization 0.040613 14 (of 197) GO:0002699 positive regulation of immune effector 0.041341 3 (of 11) GO:0030334 regulation of cell migration 0.041455 6 (of 49) 18
II. MODEL SPECIFIC FOR A MIXTURE, UNDIRECTED NETWORK A. Model Here we propose a model specific for a mixture, undirected network to see if there is any difference for performance between this specific model and the model mentioned in the main text. For the latter model, an undirected edge is viewed as two directed edges in opposite directions. Thus, the probability between any common vertices is considered twice even for the undirected networks. In the specific model, we will only consider such probability once. Thus, the likelihood can be written as A ( U) ( V) i ( ) (V) iz U z z ( U) ( V) iz z A! (U) ( V ) i z l i l iou, OV P G, exp (U) ( V ) l i l iou, OV 1 2 iou, SV i z ( U) ( V) iz z Ai /2 ( U) ( V) iz z z 1 exp ( A /2)! 2 A i! Ai exp z ( U) ( V) iz z A ( U) ( V) i iz z z ( U) ( V) iz z is,! U A i z exp. z ( U) ( V) iz z (S1) subect to constraints that ( U ) (V) (U) ( V ) 0,,, i, iz z i z l l (S2) ust as the model in the main text. The log likelihood is 19
ln P G, A ln ( U) (V) ( U) (V) ( U) (V) i iz z iz z l (U) l ( V ) z iz i iou, OV l (U) = ( V ) i iou, OV 1 1 A ln 2 2 A ln ( U) (V) ( U) (V) i iz z iz z z iz ( U) (V) ( U) ( V) i iz z iz z iou, OV z iz A ln ( U) (V) ( U) (V) i iz z iz z isu, z iz ( U ) (V) 1 iz z ( U ) (V) 1 (U) ( V) (U) ( V) A q (z)ln. l ', i, i i iz z 2 i l l l ' iz i' ' q i (z) (S3) Now considering the constraints in Eq. (12), the target function is ( U ) (V) 1 iz z ( U) (V) ( U) (V) L1 (U) ( V) (U) ( V) q (z)ln (U) ( V ), li', l li, l Ai i iz z ciz ' li, l iz z iz 2 i' ' q i (z) (S4) where c iz is the Lagrange multiplier. Differentiating Eq. (S4) with respect to ( U ) iz leads to L 1 Ai q(z) i (V) 1 ( U) (U) ( V ) (U) ( V) ( ) (U) ( V) 2 li', l li, l U z ciz ' li, l iz i' ' iz (S5) When i is a specific vertex, (U) ( V ) ' li, l ' 0, it leads to Eq. (10). When i '' and '' are common vertices that (U) ( V ) ( U ) i'' z li'', l'' 1, let Eq. (S5) equals to 0, we have (V) z A i'' q i'' c 1 1 2 (z) i'' z (U) ( V ) li '', l i' (U) ( V ) li ', l. (S6) Similarily, we have 20
( U ) '' z i ( U ) iz i A i i '' q (z) i i '' c 1 1 2 i '' z (U) ( V ) li, l '' ' (U) ( V ) li, l '. (S7) Inserting Eqs. (S6) and (S7) back to the constraints (Eq. (10)), c iz is given by c i'' '' z ( U ) 1 (V) 1 '' q '' (z) 1 (U) ( V ) '' q ''(z) 1 (U) ( V ) i iz A i i l ',, 2 i l z A i i i l ' ' 2 i l i ' 1 1 Ai'' q i'' (z) 1 (U) ( V ) A'' q '' (z) 1 (U) ( V ) i l, i i ' ', 2 i l l ' 2 i l i' (S8) z. Inserting Eq. (S8) back to Eqs. (S6) and (S7), we can get ( U ) i'' z 1 1 q (z) 1 A q (z) 1 2 2 ( U ) 1 (V) 1 iz 1 (U) ( V ) 1 (U) ( V) i li, l z ' ', 2 l ' 2 i l i' A '' '' (U) ( V ) '' '' (U) ( V ) i i i li, l i i ' li', l ' i' i A q (z) A q (z), i '' i '' S i'' i'' V ( U ) (V) iz i S z V (S9) and A q (z) A q (z) (V) i i '' i '' S i'' i'' V '' z ( U ) (V) i iz S z V (S10) B. Results 21
Supplementary figure 3. Heat maps for normalized mutual information of the specific model on random networks generated by symmetric sampling (left panels), asymmetric sampling with c 0.2 (middle panels), and asymmetric sampling with c 0.8 (right panels) using the GN benchmark and the LFR benchmark (undirected networks). For the GN benchmark, both r and s are set to be 10, and 50 random initializations are used for the module detection of each network. For the LFR benchmark, r is set to be 10 and s are set to be 5, and 10 random initializations are used for the module detection of each network. Comparing Supplementary figure 3 here and figure 4 (upper panels) and figure 5 (upper panels) in the main text, we could not see any significant difference between the specific model and the model mentioned in the main text based on GN and LFR benchmarks. 22