Multi-class cancer outlier differential gene expression detection Supplementary Information

Similar documents
Cancer outlier differential gene expression detection

RNA preparation from extracted paraffin cores:

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

MOST: detecting cancer differential gene expression

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Integrated Analysis of Copy Number and Gene Expression

Nature Methods: doi: /nmeth.3115

Supplement to SCnorm: robust normalization of single-cell RNA-seq data

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

Research Article Breast Cancer Prognosis Risk Estimation Using Integrated Gene Expression and Clinical Data

Data analysis and binary regression for predictive discrimination. using DNA microarray data. (Breast cancer) discrimination. Expression array data

Comparison of prognostic signatures for ER positive breast cancer in TransATAC:

Figure S1. Analysis of endo-sirna targets in different microarray datasets. The

ROBO1, WNT5A, CDC42BPB

TITLE: TREATMENT OF ENDOCRINE-RESISTANT BREAST CANCER WITH A SMALL MOLECULE C-MYC INHIBITOR

Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer

Interrogating differences in expression of targeted gene sets to predict breast cancer outcome

Oncology Biomarker Panel

Supplementary Information Titles Journal: Nature Medicine

User Guide. Association analysis. Input

Supplementary Methods: IGFBP7 Drives Resistance to Epidermal Growth Factor Receptor Tyrosine Kinase Inhibition in Lung Cancer

SubLasso:a feature selection and classification R package with a. fixed feature subset

Spleen. mlns. E Spleen 4.1. mlns. Spleen. mlns. Mock 17. Mock CD8 HIV-1 CD38 HLA-DR. Ki67. Spleen. Spleen. mlns. Cheng et al. Fig.

Differential Expression Analysis of Microarray Gene Data for Cancer Detection

Statistical Applications in Genetics and Molecular Biology

Bin Liu, Lei Yang, Binfang Huang, Mei Cheng, Hui Wang, Yinyan Li, Dongsheng Huang, Jian Zheng,

ABS04. ~ Inaugural Applied Bayesian Statistics School EXPRESSION

Ulrich Mansmann IBE, Medical School, University of Munich

Award Number: W81XWH TITLE: Characterizing an EMT Signature in Breast Cancer. PRINCIPAL INVESTIGATOR: Melanie C.

Genomic Analyses across Six Cancer Types Identify Basal-like Breast Cancer as a Unique Molecular Entity

Bayesian Prediction Tree Models

EXPression ANalyzer and DisplayER

BreastPRS is a gene expression assay that stratifies intermediaterisk Oncotype DX patients into high- or low-risk for disease recurrence

Figure S4. 15 Mets Whole Exome. 5 Primary Tumors Cancer Panel and WES. Next Generation Sequencing

Inferring condition-specific mirna activity from matched mirna and mrna expression data

SUPPLEMENTARY INFORMATION

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

(56) References cited:

NIH Public Access Author Manuscript Best Pract Res Clin Haematol. Author manuscript; available in PMC 2010 June 1.

Classification of cancer profiles. ABDBM Ron Shamir

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Human intestinal microbiology and probiotics

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M.

Classification of Human Breast Cancer Using Gene Expression Profiling as a Component of the Survival Predictor Algorithm

1 Supplementary Table Description

Analysis of gene expression in blood before diagnosis of ovarian cancer

A novel and universal method for microrna RT-qPCR data normalization

Author's response to reviews

No evidence of clonally selected somatic genomic alterations in cancer associated

A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer

Supplemental Tables and Figures

Supplementary Information. Targeting BRK-positive Breast Cancer with Small

Package ORIClust. February 19, 2015

SUPPLEMENTAY FIGURES AND TABLES

SOPten flox/flox (KO) Pten flox/flox (WT) flox allele 6.0 kb. Pten. Actin. ! allele 2.3 kb. Supplementary Figure S1. Yanagi, et al.

Supplementary Information

Transcriptional Profiles from Paired Normal Samples Offer Complementary Information on Cancer Patient Survival -- Evidence from TCGA Pan-Cancer Data

Supplementary Materials

Detecting gene signature activation in breast cancer in an absolute, single-patient manner

CNV PCA Search Tutorial

Breast cancer classification: beyond the intrinsic molecular subtypes

Package diggitdata. April 11, 2019

Comparison of Triple Negative Breast Cancer between Asian and Western Data Sets

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

SUPPLEMENTARY APPENDIX

Supplementary Data for:

RALYL Hypermethylation: A Potential Diagnostic Marker of Esophageal Squamous Cell Carcinoma (ESCC) Junwei Liu, MD

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities

The Role of CD164 in Metastatic Cancer Aaron M. Havens J. Wang, Y-X. Sun, G. Heresi, R.S. Taichman Mentor: Russell Taichman

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

In silico estimates of tissue components in surgical samples based on

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

Trans-Study Projection of Genomic Biomarkers in Analysis of Oncogene Deregulation and Breast Cancer

Transcription factor Foxp3 and its protein partners form a complex regulatory network

Nature Neuroscience: doi: /nn Supplementary Figure 1. Missense damaging predictions as a function of allele frequency

The LiquidAssociation Package

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

Bioinformatics Laboratory Exercise

Supplementary Figure 1. Quantile-quantile (Q-Q) plot of the log 10 p-value association results from logistic regression models for prostate cancer

A REVIEW OF BIOINFORMATICS APPLICATION IN BREAST CANCER RESEARCH

Impact of Prognostic Factors

Results and discussion

Upcoming Webinars. Profiling genes by pathways and diseases. Sample & Assay Technologies -1-

Table of content. -Supplementary methods. -Figure S1. -Figure S2. -Figure S3. -Table legend

MolEcular Taxonomy of BReast cancer International Consortium (METABRIC)

Immune Characteriza/on of Basal- like Bladder Cancer

Biomarker Identification in Breast Cancer: Beta-Adrenergic Receptor Signaling and Pathways to Therapeutic Response

Understanding DNA Copy Number Data

DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging

Supplementary Information. PRC2 overexpression and PRC2-target gene repression relating to poorer prognosis in small

Contents. 1.5 GOPredict is robust to changes in study sets... 5

Experimental Design for Immunologists

RNA-seq: filtering, quality control and visualisation. COMBINE RNA-seq Workshop

Table S2. Expression of PRMT7 in clinical breast carcinoma samples

SSM signature genes are highly expressed in residual scar tissues after preoperative radiotherapy of rectal cancer.

Molecular BioSystems PAPER. ICN: a normalization method for gene expression data considering the over-expression of informative genes.

Comparison of discrimination methods for the classification of tumors using gene expression data

Supplementary Information

The Tail Rank Test. Kevin R. Coombes. July 20, Performing the Tail Rank Test Which genes are significant?... 3

Transcription:

Multi-class cancer outlier differential gene expression detection Supplementary Information Fang Liu, Baolin Wu 1 Breast cancer microarray data analysis The breast cancer microarray data (West et al., 2001), obtained from the Affymetrix human HuGeneFL GeneChip, contained the expression levels of 7129 genes from 49 breast tumor samples. The raw data can be downloaded from http://data.cgt.duke.edu/west.php. Each sample had two binary outcomes: the status of lymph node involvement in breast cancer (negative and positive, denoted as LN-/LN+) and the estrogen receptor status (negative and positive, denoted as ER-/ER+). Among them, there are 13 ER+/LN+ tumors, 11 ER-/LN+ tumors, 12 ER+/LN- tumors, and 13 ER-/LN- tumors. We process the data using the quantile normalization (Bolstad et al., 2003) and log transformation for followup statistical analysis. In the cancer gene outlier detection, we treat the ER-/LN- group as the normal class, and apply the F-statistic, the proposed ORF and OF to detect genes with over-expressed disease samples. The genes were ranked based on each test statistic. For the top 25 genes identified by each method, we use the Bioconductor (Gentleman et al., 2004) annotation package hu6800 to search for related literature in the PubMed. 1.1 Cancer genes with over-expressed outlier disease samples Table 1 to 3 list the top 25 genes identified by each outlier detection statistic. 1.2 Cancer genes with down-expressed outlier disease samples Table 4 shows the expression profiles of the identified genes with down-expressed disease samples that have been confirmed associated with the breast cancer in previous studies. Figure 1 shows the expression profiles of these genes. Table 5 to 6 list the top 25 genes identified by each outlier detection statistic. Email: baolin@biostat.umn.edu, phone: (612)624-0647, fax: (612)626-0660 1

Figure 1: Oncogene outlier detection for breast cancer microarray data: 9 additional top ranking genes that are identified by ORF/OF and shown related to breast cancer in the literature are plotted. The ER-/LN- samples serve as the normal group. We have added some jittering to the horizontal positions to distinguish among close points. The title lists the gene names. Within the parentheses are those outlier statistics that ranked the gene in top 25. HLA B (ORF) RPL19 (ORF) RPS27 (ORF) 7.5 8.0 8.5 9.0 9.5 GAPDH (ORF) 9.3 9.4 9.5 9.6 RPL7A (ORF) 9.1 9.2 9.3 9.4 9.5 9.6 NOS3 (OF) 9.2 9.3 9.4 9.5 9.6 8.4 8.6 8.8 9.0 9.2 9.4 9.6 3 4 5 6 7 3 4 5 6 7 RFC1 (OF) 3 4 5 6 PLCG1 (OF) 3 4 5 6 7 8 MAPKAPK2 (OF) 2

Table 1: Top 25 (over-expressed) genes identified by the outlier robust F-statistic (ORF). Those in bold face font have been studied previously and confirmed associated with breast cancer in the literature. Ranking Affymetrix ID Gene Abbrevations 1 Y08976 at FEV 2 HG3319-HT3496 s at HIRA 3 S75313 at ATXN3 4 U48936 at SCNN1G 5 K02882 cds1 s at IGHD 6 X96969 at SLC14A2 7 M87770 at FGFR2 8 U17579 rna1 at GHRHR 9 M93119 at INSM1 10 J03242 s at IGF2 11 U07664 at HLXB9 12 X78817 at ARHGAP4 13 D13897 rna2 at PYY 14 M31525 at HLA-DOA 15 M96739 at NHLH1 16 U00968 at SREBF1 17 X71428 at FUS 18 M24248 at MYL3 19 D50663 at DYNLT1 20 X66417 at CSN3 21 X92518 s at HMGA2 22 HG3513-HT3707 at MYH8 23 M58285 at NCKAP1L 24 S82198 at CTRC 25 HG2566-HT4867 at MAPT References Bolstad,B., Irizarry,R., Astrand,M. and Speed,T. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19 (2), 185 193. Gentleman,R., Carey,V., Bates,D., Bolstad,B., Dettling,M., Dudoit,S., Ellis,B., Gautier,L., Ge,Y., Gentry,J., Hornik,K., Hothorn,T., Huber,W., Iacus,S., Irizarry,R., Leisch,F., Li,C., Maechler,M., Rossini,A., Sawitzki,G., Smith,C., Smyth,G., Tierney,L., Yang,J. and Zhang,J. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biology, 5 (10), R80. 3

Table 2: Top 25 genes identified by the outlier F-statistic. Those in bold face font have been studied previously and confirmed associated with breast cancer in the literature. Ranking Affymetrix ID Gene Abbrevations 1 Y08976 at FEV 2 HG3319-HT3496 s at HIRA 3 S75313 at ATXN3 4 U48936 at SCNN1G 5 K02882 cds1 s at IGHD 6 X96969 at SLC14A2 7 M87770 at FGFR2 8 U17579 rna1 at GHRHR 9 M93119 at INSM1 10 J03242 s at IGF2 11 U07664 at HLXB9 12 X78817 at ARHGAP4 13 D13897 rna2 at PYY 14 M31525 at HLA-DOA 15 M96739 at NHLH1 16 U00968 at SREBF1 17 X71428 at FUS 18 M24248 at MYL3 19 D50663 at DYNLT1 20 X66417 at CSN3 21 X92518 s at HMGA2 22 HG3513-HT3707 at MYH8 23 M58285 at NCKAP1L 24 S82198 at CTRC 25 HG2566-HT4867 at MAPT West,M., Blanchette,C., Dressman,H., Huang,E., Ishida,S., Spang,R., Zuzan,H., Olson,John A.,J., Marks,J.R. and Nevins,J.R. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS, 98 (20), 11462 11467. 4

Table 3: Top 25 genes identified by the F-statistic. Those in bold face font have been studied previously and confirmed associated with breast cancer in the literature. Ranking Affymetrix ID Gene Abbrevations 1 U41060 at SLC39A6 2 U27185 at RARRES1 3 X17059 s at NAT1 4 L08044 s at TFF3 5 U39840 at FOXA1 6 U22376 cds2 s at MYB 7 M99701 at TCEAL1 8 X83425 at BCAM 9 X55037 s at GATA3 10 X58072 at GATA3 11 X13238 at COX6C 12 X87212 at CTSC 13 L40401 at ACOT2 14 X59834 at GLUL 15 U09770 at CRIP1 16 U62325 at APBB2 17 U68385 at MEIS3P1 18 Z29083 at TPBG 19 U96113 at WWP1 20 M31627 at XBP1 21 U42408 at LAD1 22 X75861 at TEGT 23 U03886 at PNPLA4 24 U11791 at CCNH 25 M16038 at LYN 5

Table 4: Genes ranked in top 25 by the outlier detection statistics and shown to be associated with breast cancer in previous studies. The last three columns also list the ranking of each gene by the three methods. Methods Rank Affymetrix ID Gene Abbreviation F OF ORF F 1 U41060 at SLC39A6 2272 3345 9 X55037 s at GATA3 6717 6717 10 X58072 at GATA3 3034 2132 20 M31627 at XBP1 6200 6200 24 U11791 at CCNH 6380 6380 OF 5 M93718 at NOS3 320 465 8 L24783 at RFC1 2402 1781 12 M34667 at PLCG1 3143 1002 14 U12779 at MAPKAPK2 2012 894 ORF 1 D49824 s at HLA-B 4683 3645 18 X63527 at RPL19 4707 2013 20 HG3214-HT3391 at RPS27 3361 1890 24 X01677 f at GAPDH 4085 3508 25 M36072 at RPL7A 4287 580 6

Table 5: Top 25 (down-expressed) genes identified by the outlier robust F-statistic (ORF). Those in bold face font have been shown to be related to breast cancer in the literature. Ranking UniGene ID Gene Name 1 D49824 s at HLA-B 2 AFFX-HSAC07/X00351 M at ACTB 3 HG3364-HT3541 at RPL37 4 X98482 r at ACTB 5 X56932 at RPL13A 6 Z28407 at RPL8 7 M17733 at TMSB4X 8 L04483 s at RPS21 9 D79205 at RPL39 10 U27185 at RARRES1 11 U06155 s at RPL23AP7 12 M17886 at RPLP1 13 L76191 at IRAK1 14 X15940 at RPL31 15 L06505 at RPL12 16 HG1800-HT1823 at RPS20 17 HG3549-HT3751 at RPL10 18 X63527 at RPL19 19 M11147 at FTL 20 HG3214-HT3391 at RPS27 21 U05340 at CDC20 22 D86974 at LOC440345 23 HG2873-HT3017 at RPL30 24 X01677 f at GAPDH 25 M36072 at RPL7A 7

Table 6: Top 25 (down-expressed) genes identified by the outlier F-statistic (OF). Those in bold face font have been shown to be related to breast cancer in the literature. Ranking UniGene ID Gene Name 1 X57351 s at IFITM2 2 Z70759 at WNT2B 3 X70944 s at SFPQ 4 D87442 at NCSTN 5 M93718 at NOS3 6 X73113 at MYBPC2 7 HG3039-HT3200 at ARL2 8 L24783 at RFC1 9 U78678 at TXN2 10 HG110-HT110 s at HNRPAB 11 M73547 at REEP5 12 M34667 at PLCG1 13 X96484 at DGCR6 14 U12779 at MAPKAPK2 15 Y11681 at MRPS12 16 M63483 at MATR3 17 X89985 at BCL7B 18 D87435 at GBF1 19 Z50781 at TSC22D3 20 D83778 at KIAA0194 21 U22055 at SND1 22 U31176 at GFER 23 M55531 at SLC2A5 24 U93237 rna2 at MEN1 25 D26069 at CENTB2 8