SCIENCE CHINA Life Sciences RESEARCH PAPER April 2013 Vol.56 No.4: 1 7 doi: 10.1007/s11427-013-4460-x Table S1 Human paired-end RNA-Seq data sets from the Sequence Read Archive (SRA) database Experiment accession SRX026669 SRX026674 SRX026694 SRX026670 SRX026684 SRX026687 SRX026673 SRX026675 SRX026693 Cell lines H1-hESC H1-hESC H1-hESC HepG2 HepG2 HUVEC NHEK HeLa-S3 GM12878 Table S2 Gene expression data sets from GEO database used to construct two-color co-expression network No. GEO accession Discription 1 GSE9988(62) Innate immune repsonses to TREM-1 activation 2 GSE11882 (173) Gene expression changes in the course of normal brain aging are sexually dimorphicgene expression changes in the course of normal brain aging are sexually dimorphic 3 GSE9196 (53) Comparison of ES, EB, and Blast cells to breast epithelial, leuckocytes, endothelial and stromal cells 4 GSE15431 (31) Global Gene Expression in the Human Fetal Testis and Ovary 5 GSE7214 (27) Comparison of gene expression data between wild-type and DM1-affected cells 6 GSE8823 (24) Overexpression of the Apoptotic Cell Removal Receptor, MERTK, in Alveolar Macrophages of Cigarette Smokers 7 GSE5372 (22) airway epithelium, large airways, pre and post-mechanical injury 8 GSE9254 (19) Normal human colorectal mucosa, cecum, ascending, transverse, sigmoid and rectum 9 GSE19599 (16) Expression data for normal flow sorted hematopietic cell subpopulations 10 GSE22167 (18) Reprogramming of T Cells from Human Peripheral Blood 11 GSE18265 (13) Transcriptomic analysis of pluripotent stem cells 12 GSE9709 (10) Human induced pluripotent stem (ips) cells from neonatal skin derived cells 13 GSE3526 (353) Comparison of gene expression profiles across the normal human body 14 GSE4888 (27) Molecular phenotyping of human endometrium 15 GSE15238 (10) Expression data from human embryonic (9-12w) and post-natal livers 16 GSE17251(24) Comparative analysis of gene regulation by the transcription factor PPARα_human 17 GSE14334 (29) Transcriptomic analysis of human lung development 18 GSE7821(40) Expression data from human intestinal biopsies 19 GSE11103(41) Study of human immune and memory T cells using microarray 20 GSE23968(14) Large intergenic non-coding RNAs as novel modulators of reprogramming: ESCs, fibroblast, and fibroblast-derived ipsc (gene expression) 21 GSE18897(80) Whole blood expression profiling of obese diet-sensitive, obese diet-resistant and lean human subjects 22 GSE9419(66) The skeletal muscle transcript profile reflects responses to inadequate protein intake in younger and older males 23 GSE9865(13) Expression profile of dermal fibroblasts reprogrammed to a pluripotent state 24 GSE7888(23) Expression data from human mesenchymal stem cells (six batches) 25 GSE13564(44) Gene expression in the human prefrontal cortex during postnatal development 26 GSE2125(45) isolated alveolar macrophages 27 GSE10041(72) Genomic Counter-Stress Changes Induced by Mind-Body Practice 28 GSE18791(57) Antiviral response dictated by choreographed cascade of transcription factors 29 GSE8658(63) PPARg regulated gene expression in human dendritic cells 30 GSE18637(20) Do Airway Epithelium Air-liquid Cultures Represent the In Vivo Airway Epithelium Transcriptome? 31 GSE16028(109) Longitudinal study of gene expression in healthy individuals The Author(s) 2013. This article is published with open access at Springerlink.com life.scichina.com www.springer.com/scp
2 Sun L, et al. Sci China Life Sci April (2013) Vol.56 No.4 Table S3 Properties of networks using different number of datasets a) Number of connections (cc/nn/cn) (c/n) of degree coefficient topology criterion GO pairs (random) Number of genes Average number Network clustering Scale-free Proportion of same Number of dataset Gamma 580660 16756 70 0.06 0.78 1.56 7% 3 467 21833 486 (4%) 178610 15681 73 0.06 0.69 1.44 12% 4 192 6184 445 (5%) 5 48946 9 0.25 0.92 1.86 20% 10802 79 (5%) 256 2721 20473 5755 14 0.26 0.91 1.72 30% 6 40 542 191 (8%) 11706 3212 7 0.46 0.85 1.46 41% 7 16 233 66 (11%) 8480 2206 9 0.48 0.83 1.38 48% 8 10 133 53 (14%) 6072 1382 13 0.51 0.78 1.25 56% 9 7 80 42 (20%) a) c: coding genes; n: lincrnas; cc: coding-coding gene connections; nn: lincrnas-lincrnas connections; cn: coding gene-lincrnas connections. Table S4 Genetic mediators of protein coding genes Gene ID Gene symbol Validation 64581 CLEC7A no 4688 NCF2 no 7409 VAV1 yes 646 BNC1 no 6657 SOX2 yes 4815 NINJ2 no 83593 RASSF5 yes 4048 LTA4H yes 5321 PLA2G4A no 4922 NTS yes 1294 COL7A1 yes 5319 PLA2G1B yes 64399 HHIP no 200504 GKN2 no 11113 CIT yes 24 ABCA4 no 27334 P2RY10 yes 2304 FOXE1 no 722 C4BPA yes 55734 ZFP64 no 23250 ATP11A yes 3815 KIT yes 219790 RTKN2 no 1592 CYP26A1 no 2676 GFRA3 no 6098 ROS1 no 51560 RAB6B no 5744 PTHLH yes 190 NR0B1 yes 1399 CRKL no 23596 OPN3 no 26256 CABYR yes 56241 SUSD2 no 5045 FURIN no (To be continued on the next page)
(Continued) Gene ID Gene symbol Validation 961 CD47 yes 56938 ARNTL2 no 120892 LRRK2 yes 56666 PANX2 no 55024 BANK1 no 10344 CCL26 no 3641 INSL4 no 81793 TLR10 no 92304 SCGB3A1 yes 51208 CLDN18 yes 13 AADAC no 2719 GPC3 yes 53836 GPR87 no 8745 ADAM23 no 7090 TLE3 no 23221 RHOBTB2 yes 121355 GTSF1 no 5361 PLXNA1 no 339512 C1orf110 no 8999 CDKL2 no 26575 RGS17 yes 8547 FCN3 yes 5081 PAX7 yes 29948 OSGIN1 no 5275 SERPINB13 no 2196 FAT2 yes 2769 GNA15 no 27159 CHIA no 23462 HEY1 yes 50964 SOST no 6016 RIT1 no 170712 COX7B2 no 79368 FCRL2 no 266977 GPR110 no 9856 KIAA0319 no 7903 ST8SIA4 no 2702 GJA5 no 11197 WIF1 yes 9048 ARTN yes 92291 CAPN13 no 51761 ATP8A2 no 4973 OLR1 yes 5803 PTPRZ1 yes 55755 CDK5RAP2 yes 5590 PRKCZ no 11040 PIM2 yes 9182 RASSF9 no 3851 KRT4 yes 3853 KRT6A yes 6357 CCL13 no 54210 TREM1 no 26207 PITPNC1 no 7356 SCGB1A1 yes 6335 SCN9A no 84759 PCGF1 no 2729 GCLC yes 8626 TP63 yes 51297 PLUNC yes 6402 SELL yes 3694 ITGB6 no 1591 CYP24A1 yes 53833 IL20RB yes 116154 PHACTR3 yes Sun L, et al. Sci China Life Sci April (2013) Vol.56 No.4 3
4 Sun L, et al. Sci China Life Sci April (2013) Vol.56 No.4 Table S5 LincRNA-sets dysregulated in lung cancer. Star denotes the sets detected in both experiments Name of lincrnas state Functional annotation Est_480 Down Regulation of transcription(bp) and RNA metabolic process Est_884 Down chromatin modification Est_1088 Down Metabolism process(kegg) Est_1650 Down Metabolism process Est_669 Down MAPK signaling pathway(ke) Est_997 Down Regulation of metabolism Scrip_2185 Down* regulation of immune system process and leukocyte activation Both_202 Down embryonic cleavage Scrip_2379 Up* Regulation of cell cycle and cell death Est_835 Up* response to stimulus and regulation of cell cycle Est_810 Up* metabolic process, system development and embryonic development Scrip_3488 Up positive regulation of transcription and Wnt signaling pathway Figure S1 Computational pipelines for identification of human lincrnas.
Sun L, et al. Sci China Life Sci April (2013) Vol.56 No.4 5 Figure S2 Expression profiles of protein coding and lincrna genes. A, Expressional correlations of protein coding genes. Red line: distribution of pearson correlation coefficients for the expression rank of identical protein coding genes across six cell lines in RNA-Seq data and in the re-annotated Human Genome U133 Plus 2.0 Array. Light grey lines: distribution of pearson correlation coefficients for the expression rank of randomly selected protein coding genes pairs, repeated 1000 times. (Mean pearson correlation coefficient was 0.51, P-value of the KS test was less then 2.2e 10 ). B, Expressional correlations of lincrnas. Red line: distribution of pearson correlation coefficients for the expression rank of identical lincrnas across six cell lines in RNA-Seq data and in the re-annotated Human Genome U133 Plus 2.0 Array. Light grey lines: distribution of pearson correlation coefficients for the expression rank of randomly selected lincrna pairs, repeated 1000 times. (Mean pearson correlation coefficient was 0.41, P-value of the KS test was less than 2.2e 10 ). C, Expression profiles of lincrnas Both_148, Scrip_3188, Est_1593 and Both_2 in the re-annotated Human Genome U133 Plus 2.0 Array and RNA-Seq data.
6 Sun L, et al. Sci China Life Sci April (2013) Vol.56 No.4 Figure S3 Genes differently expressed response to CDK inhibitor treatment. A, Diagram showing number of protein coding genes differently expressed response to CDK inhibitor treatment in three cell lines. B, Diagram showing number of long ncrna genes differently expressed response to CDK inhibitor treatment in three cell lines. C, GO enrichment analysis results of 1254 coding genes which differently expressed response to CDK inhibitor treatment. Figure S4 Genomic contexts of three lincrna genes. LincRNAs as genetic mediators include Scrip_2341 (Up), Scrip_2379 (Middle) and Scrip_2616 (Down). Other information include H3K4me3 marker, RefSeq gene and conservation status.
Sun L, et al. Sci China Life Sci April (2013) Vol.56 No.4 7 Figure S5 The relationship between coding gene-centered gene sets and cancer. A, GO enrichment analysis result of central coding genes of 167 gene sets which significantly induced (or repressed) in lung cancer relative to normal lung. B, The left figure shows the genes in KIF11 related gene set induced (red) or repressed (green) in each array. The bottom bars denote the experiment sets which array belongs to. Cancer experiment sets were marked by yellow color. The right figure shows KIF11 related gene set. Genes colored in green are annotated as cell cycle genes by KEGG and colored in pink are annotated as cell cycle genes by GO. Yellow color denotes other functions. Edges colored in light grey (or green) are negative (or positive) correlations. Figure S6 An example of relationship between lincrna-set and cancer. A, Scrip_2185 related gene set. Genes colored in pink are immune related genes. Yellow color denotes other functions. Edges colored in light grey (or green) are negative (or positive) correlations. B, Genomic contexts of Scrip_2185 gene locus and UCSC genes.