Contents. November 1, 2008

Size: px
Start display at page:

Download "Contents. November 1, 2008"

Transcription

1 SUPPLEMENTARY INFORMATION A Primary Xenograft Model of Small Cell Lung Cancer Reveals Irreversible Changes in Gene Expression Imposed by Culture In-Vitro Vincent C. Daniel, Luigi Marchionni, Jared S. Hierman, Jonathan T. Rhodes, Wendy L. Devereux, Charles M. Rudin, Rex Yung, Marion Dorsch, Giovanni Parmigani, Craig D. Peacock and D. Neil Watkins November 1, 2008 Raw gene expression data will permanently hosted in the NCBI Gene Expression Omnibus (GEO) database [1], upon publication, and are currently available at: Contents 1 Materials and Methods Datasets GeneAnnotation Experimentalprotocols Pre-processing Differential gene expression AnalysisofFunctionalAnnotation Useofheatmaps Concordance-at-the-topplots Correlation among groups and individual samples Software Results De Novo data set pre-processing Differentially expressed genes Concordanceatthetopplots AnalysisofFunctionalAnnotation ComparisontoprimarySCLC Combined data set pre-processing Comparisonbetweenmodelsandprimarytumors Literature Cited 33 1

2 List of Figures 1 Diagnostic Plots MA plots before normalization MA plots after normalization Volcano Plots Venn Diagrams, direct comparison Venn Diagrams, direct comparison Correlation among XG, CL, CLX Correlation between XG, CL, and CLX using fold-change Correlation between XG, CL, and CLX using the group median intensity Cat-plot based on fold-change for all probe sets Cat-plot based on t-statistics for all probe sets Cat-plot based on t-statistics for the unique genes AFA consistency plots Analysis of Functional Annotation: Gene Ontology Analysis of Functional Annotation: Molecular Signature data base pathways Boxplots for the combined analysis Barcode pre-processing compared to standard RMA pre-processing Barcode pre-processing, correlation distributions Venn diagrams for comparisons to normal lung in the combined data set Comparison between laboratory models and primary SCLC Comparison between laboratory models and primary SCLC Comparison between laboratory models and primary SCLC Comparison between laboratory models and primary SCLC Comparison between laboratory models and primary SCLC Comparison between laboratory models and primary SCLC List of Tables 1 Description of the data sets used in the study Barcode pre-processing, correlation distributions

3 1 Materials and Methods 1.1 Data sets Using the Affymetrix platform, we first analyzed differential gene expression in primary xenografts (XG), xenografts-derived cell lines (CL), and secondary xenografts (CLX) from Small Cells Lung Cancer (SCLC). This comparison was performed by direct comparison of the different groups, using linear model analysis [2, 3, 4]. We then extended this analysis to similar data sets from the public domain that included conventional SCLC laboratory models, normal lung, and primary SCLC specimens, starting from the raw data (CEL files) and applying the same analytical approach (linear model analysis) (see Table 1). SCLC models. De novo microarray analysis was performed to obtain gene expression data from matched primary and secondary xenografts, and from the corresponding cell lines (XG, CL, and CLX respectively). Gene expression was also obtained for several additional Small Cell Lung Cancer (SCLC) cell lines and secondary xenografts, and from 4 primary SCLC biopsies. Overall three distinct data sets were generated on three distinct Affymetrix platforms in two distinct laboratories: Samples analyzed by one color array using the human Affymetrix GeneChip hgu133plus2: Laboratory model triplets (XG, CL, and CLX) from three distinct patients (lx22, lx33, and lx36); Laboratory model pairs (CL and CLX) from 11 distinct patients, or established SCLC cell lines (H69, H82, H128, H146, H187, H209, H345, H446, H526, H1618, and H1930); Commercially available total RNA from normal human lung, and the universal human reference RNA (Stratagene); Samples analyzed by one color array using the human Affymetrix GeneChip hgu133a2: Four distinct SCLC cell lines (H82, H187, H526 and H1618); Samples analyzed by one color array using the human Affymetrix GeneChip hgu133a: Four primary SCLC specimens from 4 distinct patients independent from the laboratory models described above; The first data set (data set A in Table 1 was used to compare the three SCLC models (XG, CL, and CLX) to one another and to pairs of established SCLC cell lines and derivative xenografts. The second and third data sets above (data sets B and C in Table 1) were used in the combined analysis with the public domain data sets described below. Public domain data. Gene expression from the public domain was obtained from the Gene Expression Omnibus (GEO) database [5] or from the original paper web-site (Bhattacharjee: http: // The following public domain data sets were retrieved and analyzed: Samples analyzed by one color array using the human Affymetrix GeneChip hgu133plus2: Eight normal lung samples, 4 from the GSE3526 data set and 4 from GSE7307; Samples analyzed by one color array using the human Affymetrix GeneChip hgu133a2: Six secondary xenografts from the H69 cell lines (3 treated and 3 untreated with chemotherapy) from the GSE8920 data set; 3

4 Thirty-four samples corresponding to 16 distinct SCLC cell lines (H211, H510, H524, H889, H1417, H1963, DMS-53, SW-1271, H146, H69, H82, H196, H209, H526, H345, H187) from the GSE7097 data set [6]; Samples analyzed by one color array using the human Affymetrix GeneChip hgu133a: Ten samples corresponding to 10 distinct SCLC cell lines (H69, N231, LU135, SBC3, PC-6, LU130, Lu139, Lu165, MS-1, SBC-5) from the GSE4127 data set; Twenty-three samples corresponding to 23 distinct SCLC cell lines (H1672, H1184, H128, H146, H1607, H187, H1963, H2052, H209, H2107, H2141, H2171, H2195, H2227, H289, H378, H524, H526, H82, H841f, H889, HCC33, HCC970) from the GSE4824 data set [7]; Thirty samples corresponding to 30 distinct normal lung specimens (including 2 arrays normal human lung RNA by Stratagene and Clontech) from the GSE7670 data set [8]; One normal lung sample from the GSE2361 data set [9]; Samples analyzed by one color array using the human Affymetrix GeneChip hgu95av2: Six primary SCLC samples corresponding to 6 distinct patients form the Bhattacharjee study [10]; Seventeen normal lung samples from 17 individuals from the Bhattacharjee study [10]; Samples analyzed by one color array using the human Affymetrix GeneChip hgfocus: Nine primary SCLC samples corresponding to 9 distinct patients form the GSE6044 data set; Five normal lung samples from 5 individuals from the GSE6044 data set; Overall 192 arrays from 13 data sets (5 Affimetrix platforms) were analyzed, accounting for 62 normal lung samples, 19 primary SCLC, 4 primary xenografts (3 patients), 22 secondary xenografts, and 85 SCLC cell lines specimens. A synopis of the all the available specimens is in Table 1. DataSet Platform OriginalName TissueType Patients GEOseries GEOsample D hgu133a NCI-H69 CL H69 GSE4127 GSM94309 D hgu133a N231 CL N231 GSE4127 GSM94310 D hgu133a LU135 CL LU135 GSE4127 GSM94311 D hgu133a SBC3 CL SBC3 GSE4127 GSM94312 D hgu133a PC-6 CL PC-6 GSE4127 GSM94318 D hgu133a LU130 CL LU130 GSE4127 GSM94320 D hgu133a Lu139 CL Lu139 GSE4127 GSM94321 D hgu133a Lu165 CL Lu165 GSE4127 GSM94322 D hgu133a MS-1 CL MS-1 GSE4127 GSM94330 D hgu133a SBC-5 CL SBC-5 GSE4127 GSM94331 E hgu133a H1672(A2) CL H1672 GSE4824 GSM E hgu133a H1184(A) CL H1184 GSE4824 GSM E hgu133a H128(A) CL H128 GSE4824 GSM E hgu133a H146(A2) CL H146 GSE4824 GSM E hgu133a H1607(A) CL H1607 GSE4824 GSM E hgu133a H187(A) CL H187 GSE4824 GSM E hgu133a H1963(A) CL H1963 GSE4824 GSM E hgu133a H2052(A) CL H2052 GSE4824 GSM E hgu133a H209(A2) CL H209 GSE4824 GSM E hgu133a H2107(A) CL H2107 GSE4824 GSM E hgu133a H2141(A) CL H2141 GSE4824 GSM E hgu133a H2171(A) CL H2171 GSE4824 GSM E hgu133a H2195(A) CL H2195 GSE4824 GSM E hgu133a H2227(A) CL H2227 GSE4824 GSM E hgu133a H289(A) CL H289 GSE4824 GSM E hgu133a H378(A) CL H378 GSE4824 GSM E hgu133a H524(A) CL H524 GSE4824 GSM E hgu133a H526(A2) CL H526 GSE4824 GSM E hgu133a H82(A) CL H82 GSE4824 GSM E hgu133a H841f(A) CL H841f GSE4824 GSM E hgu133a H889(A) CL H889 GSE4824 GSM E hgu133a HCC33(A) CL HCC33 GSE4824 GSM E hgu133a HCC970(A) CL HCC970 GSE4824 GSM I hgu133a Norm1 NormLung Norm1 GSE2361 GSM44704 O hgu133a N1-o NormLung N1-o GSE7670 GSM O hgu133a N2-o NormLung N2-o GSE7670 GSM O hgu133a N3-o NormLung N3-o GSE7670 GSM O hgu133a N4-o NormLung N4-o GSE7670 GSM O hgu133a N5-o NormLung N5-o GSE7670 GSM O hgu133a N6-o NormLung N6-o GSE7670 GSM O hgu133a N7-o NormLung N7-o GSE7670 GSM O hgu133a N8-o NormLung N8-o GSE7670 GSM O hgu133a N12-o NormLung N12-o GSE7670 GSM O hgu133a N13-o NormLung N13-o GSE7670 GSM

5 DataSet Platform OriginalName TissueType Patients GEOseries GEOsample O hgu133a N14-o NormLung N14-o GSE7670 GSM O hgu133a N15-o NormLung N15-o GSE7670 GSM O hgu133a N16-o NormLung N16-o GSE7670 GSM O hgu133a N18-o NormLung N18-o GSE7670 GSM O hgu133a N19-o NormLung N19-o GSE7670 GSM O hgu133a N20-o NormLung N20-o GSE7670 GSM O hgu133a N21-o NormLung N21-o GSE7670 GSM O hgu133a N22-o NormLung N22-o GSE7670 GSM O hgu133a N23-o NormLung N23-o GSE7670 GSM O hgu133a N24-o NormLung N24-o GSE7670 GSM O hgu133a N25-o NormLung N25-o GSE7670 GSM O hgu133a N26-o NormLung N26-o GSE7670 GSM O hgu133a N27-o NormLung N27-o GSE7670 GSM O hgu133a N29-o NormLung N29-o GSE7670 GSM O hgu133a N30-o NormLung N30-o GSE7670 GSM O hgu133a N54-o NormLung N54-o GSE7670 GSM O hgu133a N57-o NormLung N57-o GSE7670 GSM O hgu133a NMix-o NormLung NMix-o GSE7670 GSM O hgu133a NStratagene(Cat )-o NormLung Nstratagene GSE7670 GSM O hgu133a NClontech(Cat )-o NormLung Nclontech GSE7670 GSM G hgu133a2 H69 tumor post-pdt, biological rep1 CLX H69 GSE8920 GSM G hgu133a2 H69 tumor post-pdt, biological rep2 CLX H69 GSE8920 GSM G hgu133a2 H69 tumor post-pdt, biological rep3 CLX H69 GSE8920 GSM G hgu133a2 H69 tumor untreated, biological rep1 CLX H69 GSE8920 GSM G hgu133a2 H69 tumor untreated, biological rep2 CLX H69 GSE8920 GSM G hgu133a2 H69 tumor untreated, biological rep3 CLX H69 GSE8920 GSM B hgu133a2 PT PrimarySCLC AbbottSCLC NA B hgu133a2 PT PrimarySCLC AbbottSCLC NA B hgu133a2 PT PrimarySCLC AbbottSCLC NA B hgu133a2 PT PrimarySCLC AbbottSCLC NA F hgu133a2 NCI-H211 cell line (NCI-H211a) CL H211 GSE7097 GSM F hgu133a2 NCI-H211 cell line (NCI-H211c) CL H211 GSE7097 GSM F hgu133a2 NCI-H510 cell line (NCI-H510a) CL H510 GSE7097 GSM F hgu133a2 NCI-H510 cell line (NCI-H510c) CL H510 GSE7097 GSM F hgu133a2 NCI-H524 cell line (NCI-H524a) CL H524 GSE7097 GSM F hgu133a2 NCI-H524 cell line (NCI-H524c) CL H524 GSE7097 GSM F hgu133a2 NCI-H889 cell line (NCI-H889a) CL H889 GSE7097 GSM F hgu133a2 NCI-H889 cell line (NCI-H889c) CL H889 GSE7097 GSM F hgu133a2 NCI-H1417 cell line (NCI-H1417a) CL H1417 GSE7097 GSM F hgu133a2 NCI-H1417 cell line (NCI-H1417c) CL H1417 GSE7097 GSM F hgu133a2 NCI-H1963 cell line (NCI-H1963a) CL H1963 GSE7097 GSM F hgu133a2 NCI-H1963 cell line (NCI-H1963c) CL H1963 GSE7097 GSM F hgu133a2 SCLC cell line DMS-53 (DMS-53.2) CL DMS-53 GSE7097 GSM F hgu133a2 SCLC cell line DMS-53 (DMS-53.3) CL DMS-53 GSE7097 GSM F hgu133a2 SCLC SW-1271 cell line (SW ) CL SW-1271 GSE7097 GSM F hgu133a2 SCLC SW-1271 cell line (SW ) CL SW-1271 GSE7097 GSM F hgu133a2 SCLC cell line NCI-H146 (NCI-H146.0) CL H146 GSE7097 GSM F hgu133a2 SCLC NCI-H146 cell line (NCI-H146.2) CL H146 GSE7097 GSM F hgu133a2 SCLC H69AR cell line (H69AR.0) CL H69 GSE7097 GSM F hgu133a2 SCLC NCI-H82 cell line (NCI-H82.2) CL H82 GSE7097 GSM F hgu133a2 SCLC NCI-H82 cell line (NCI-H82.3) CL H82 GSE7097 GSM F hgu133a2 SCLC NCI-H196 cell line (NCI-H196.1) CL H196 GSE7097 GSM F hgu133a2 SCLC NCI-H196 cell line (NCI-H196.2) CL H196 GSE7097 GSM F hgu133a2 SCLC NCI-H209 cell line (NCI-H209.1) CL H209 GSE7097 GSM F hgu133a2 SCLC NCI-H209 cell line (NCI-H209.2) CL H209 GSE7097 GSM F hgu133a2 SCLC NCI-H209 cell line (NCI-H209.3) CL H209 GSE7097 GSM F hgu133a2 SCLC NCI-H526 cell line (NCI-H526ps.1) CL H526 GSE7097 GSM F hgu133a2 SCLC NCI-H526 cell line (NCI-H526ps2) CL H526 GSE7097 GSM F hgu133a2 SCLC NCI-H526 cell line (NCI-H526ps3) CL H526 GSE7097 GSM F hgu133a2 SCLC NCI-H345 cell line (NCI-H345.s8) CL H345 GSE7097 GSM F hgu133a2 SCLC NCI-H345 cell line (NCI-H345.s9) CL H345 GSE7097 GSM F hgu133a2 SCLC NCI-H187 cell line (NCI-H187.s10) CL H187 GSE7097 GSM F hgu133a2 SCLC NCI-H187 cell line (NCI-H187.s11) CL H187 GSE7097 GSM F hgu133a2 SCLC NCI-H187 cell line (NCI-H187.s12) CL H187 GSE7097 GSM A hgu133plus2 X-LX33 XG LX33 Novartis NA A hgu133plus2 X-LX36 XG LX36 Novartis NA A hgu133plus2 CL-H69 CL H69 Novartis NA A hgu133plus2 CL-H82 CL H82 Novartis NA A hgu133plus2 CL-H128 CL H128 Novartis NA A hgu133plus2 CL-H146 CL H146 Novartis NA A hgu133plus2 CL-H187 CL H187 Novartis NA A hgu133plus2 CL-H209 CL H209 Novartis NA A hgu133plus2 CL-H345 CL H345 Novartis NA A hgu133plus2 CL-H446 CL H446 Novartis NA A hgu133plus2 CL-H526 CL H526 Novartis NA A hgu133plus2 CL-H1618 CL H1618 Novartis NA A hgu133plus2 CL-H1930 CL H1930 Novartis NA A hgu133plus2 CL-LX22 CL LX22 Novartis NA A hgu133plus2 CL-LX33 CL LX33 Novartis NA A hgu133plus2 CL-LX36 CL LX36 Novartis NA A hgu133plus2 CLX-H69 CLX H69 Novartis NA A hgu133plus2 CLX-H82 CLX H82 Novartis NA A hgu133plus2 CLX-H128 CLX H128 Novartis NA A hgu133plus2 CLX-H146 CLX H146 Novartis NA A hgu133plus2 CLX-H187 CLX H187 Novartis NA A hgu133plus2 CLX-H209 CLX H209 Novartis NA A hgu133plus2 CLX-H345 CLX H345 Novartis NA A hgu133plus2 CLX-H446 CLX H446 Novartis NA A hgu133plus2 CLX-H526 CLX H526 Novartis NA A hgu133plus2 CLX-H1618 CLX H1618 Novartis NA A hgu133plus2 CLX-H1930 CLX H1930 Novartis NA A hgu133plus2 CLX-LX22 CLX LX22 Novartis NA A hgu133plus2 CLX-LX33 CLX LX33 Novartis NA A hgu133plus2 CLX-LX36 CLX LX36 Novartis NA A hgu133plus2 Norm. Lung Tissue NormLung Nstratagene Novartis NA A hgu133plus2 Ref. (Stratagen) REF REF Novartis NA 5

6 DataSet Platform OriginalName TissueType Patients GEOseries GEOsample A hgu133plus2 X-LX22 XG LX22 Novartis NA A hgu133plus2 CLX-H128 CLX H128 Novartis NA A hgu133plus2 CLX-LX36 CLX LX36 Novartis NA A hgu133plus2 Ref. (Stratagen) REF REF Novartis NA A hgu133plus2 X-LX22 XG LX22 Novartis NA C hgu133plus2 H82C-1a CL H82 AbbottCL NA C hgu133plus2 H187C-1a CL H187 AbbottCL NA C hgu133plus2 H526C-1a CL H526 AbbottCL NA C hgu133plus2 H1618C-1a CL H1618 AbbottCL NA L hgu133plus2 lung-3 NormLung lung-3 GSE3526 GSM80707 L hgu133plus2 lung-1 NormLung lung-1 GSE3526 GSM80710 L hgu133plus2 lung-4 NormLung lung-4 GSE3526 GSM80712 M hgu133plus2 Lung3Normal NormLung Lung3Normal GSE7307 GSM M hgu133plus2 Lung1Normal NormLung Lung1Normal GSE7307 GSM M hgu133plus2 Lung4Normal NormLung Lung4Normal GSE7307 GSM H hgu95av2 SMC9T1-A357-6 PrimarySCLC SMCL9T1 Bhattacharjee NA H hgu95av2 NL279n1-A98-3 NormLung NL279n1 Bhattacharjee NA H hgu95av2 NL1884-A411-7 NormLung NL1884 Bhattacharjee NA H hgu95av2 NL4353-A334-6 NormLung NL4353 Bhattacharjee NA H hgu95av2 NL6084-A335-6 NormLung NL6084 Bhattacharjee NA H hgu95av2 NL4083-A333-6 NormLung NL4083 Bhattacharjee NA H hgu95av2 NL3681-A332-6 NormLung NL3681 Bhattacharjee NA H hgu95av2 SMC5937-A307-5 PrimarySCLC SMCL5937 Bhattacharjee NA H hgu95av2 NL2378-A412-7 NormLung NL2378 Bhattacharjee NA H hgu95av2 NL1179-A408-7 NormLung NL1179 Bhattacharjee NA H hgu95av2 NL2562-A413-7 NormLung NL2562 Bhattacharjee NA H hgu95av2 NL1675-A409-7 NormLung NL1675 Bhattacharjee NA H hgu95av2 SMC4T1-A354-6 PrimarySCLC SMCL4T1 Bhattacharjee NA H hgu95av2 NL6943-A337-6 NormLung NL6943 Bhattacharjee NA H hgu95av2 NL268n1-A92-3 NormLung NL268n1 Bhattacharjee NA H hgu95av2 SMC3T1-A PrimarySCLC SMCL3T1 Bhattacharjee NA H hgu95av2 NL504-A407-7 NormLung NL504 Bhattacharjee NA H hgu95av2 SMC301-A306-5 PrimarySCLC SMCL301 Bhattacharjee NA H hgu95av2 NL1698-A410-7 NormLung NL1698 Bhattacharjee NA H hgu95av2 SMC8T1-A356-6 PrimarySCLC SMCL8T1 Bhattacharjee NA H hgu95av2 NL7530-A338-6 NormLung NL7530 Bhattacharjee NA H hgu95av2 NL6853-A336-6 NormLung NL6853 Bhattacharjee NA H hgu95av2 NL3104-A331-6 NormLung NL3104 Bhattacharjee NA N hgfocus K1 PrimarySCLC K1 GSE6044 GSM N hgfocus K2 PrimarySCLC K2 GSE6044 GSM N hgfocus K3 PrimarySCLC K3 GSE6044 GSM N hgfocus K4 PrimarySCLC K4 GSE6044 GSM N hgfocus K5 PrimarySCLC K5 GSE6044 GSM N hgfocus K6 PrimarySCLC K6 GSE6044 GSM N hgfocus K7 PrimarySCLC K7 GSE6044 GSM N hgfocus K8 PrimarySCLC K8 GSE6044 GSM N hgfocus K9 PrimarySCLC K9 GSE6044 GSM N hgfocus N1 NormLung Norm GSE6044 GSM N hgfocus N2 NormLung Norm GSE6044 GSM N hgfocus N3 NormLung Norm GSE6044 GSM N hgfocus N4 NormLung Norm GSE6044 GSM N hgfocus N5 NormLung Norm GSE6044 GSM Table 1: RNA specimens considered in the present analysis. Data set source, platform, and type of sample are reported. Samples are grouped by platform and by data set. XG: primary xenograft; CL: cell line; CLX: xenograft from cell line; PrimarySCLC: Primary Small Cell Lung Cancer; NormLung: Normal Lung specimen; REF: universal reference RNA (Stratagene), GSE and GSM: Gene Expression Omnibus (GEO) Series and Sample identifier in the GEO data base. For all samples we retrieved the raw data (Affymetrix CEL files) and the associated information from the original paper web-site (Bhattacharjee: or from the GEO database [5]. 1.2 Gene Annotation Gene annotation for the five considered platforms was obtained from metadata packages available from R-Bioconductor project [11, 12]. Cross-referencing of array features across platforms was based on Entrez Gene identifiers. When cross-referencing was required, Affymetrix probe sets mapping to multiple Entrez Gene identifiers were excluded from the analysis, while multiple probe sets mapping to same Entrez Gene identifiers (redundant probe sets) were filtered by keeping the most differentially expressed ones in each data set. Below is the detailed annotation for the five Affymetrix platforms used in the present study as obtained from each meta-data package. 6

7 hgu133plus2 platform: Quality control information for hgu133plus2: This package has the following mappings: hgu133plus2accnum has mapped keys (of keys) hgu133plus2alias2probe has mapped keys (of keys) hgu133plus2chr has mapped keys (of keys) hgu133plus2chrlengths has 25 mapped keys (of 25 keys) hgu133plus2chrloc has mapped keys (of keys) hgu133plus2ensembl has mapped keys (of keys) hgu133plus2ensembl2probe has mapped keys (of keys) hgu133plus2entrezid has mapped keys (of keys) hgu133plus2enzyme has 4952 mapped keys (of keys) hgu133plus2enzyme2probe has 840 mapped keys (of 840 keys) hgu133plus2genename has mapped keys (of keys) hgu133plus2go has mapped keys (of keys) hgu133plus2go2allprobes has 9215 mapped keys (of 9215 keys) hgu133plus2go2probe has 6743 mapped keys (of 6743 keys) hgu133plus2map has mapped keys (of keys) hgu133plus2omim has mapped keys (of keys) hgu133plus2path has mapped keys (of keys) hgu133plus2path2probe has 201 mapped keys (of 201 keys) hgu133plus2pfam has mapped keys (of keys) hgu133plus2pmid has mapped keys (of keys) hgu133plus2pmid2probe has mapped keys (of keys) hgu133plus2prosite has mapped keys (of keys) hgu133plus2refseq has mapped keys (of keys) hgu133plus2symbol has mapped keys (of keys) hgu133plus2unigene has mapped keys (of keys) mouse4302 platform: Quality control information for hgu133a2: This package has the following mappings: hgu133a2accnum has mapped keys (of keys) hgu133a2alias2probe has mapped keys (of keys) hgu133a2chr has mapped keys (of keys) hgu133a2chrlengths has 25 mapped keys (of 25 keys) hgu133a2chrloc has mapped keys (of keys) hgu133a2ensembl has mapped keys (of keys) hgu133a2ensembl2probe has mapped keys (of keys) hgu133a2entrezid has mapped keys (of keys) hgu133a2enzyme has 2890 mapped keys (of keys) hgu133a2enzyme2probe has 791 mapped keys (of 791 keys) hgu133a2genename has mapped keys (of keys) hgu133a2go has mapped keys (of keys) hgu133a2go2allprobes has 8924 mapped keys (of 8924 keys) hgu133a2go2probe has 6417 mapped keys (of 6417 keys) hgu133a2map has mapped keys (of keys) hgu133a2omim has mapped keys (of keys) hgu133a2path has 6457 mapped keys (of keys) hgu133a2path2probe has 199 mapped keys (of 199 keys) hgu133a2pfam has mapped keys (of keys) hgu133a2pmid has mapped keys (of keys) hgu133a2pmid2probe has mapped keys (of keys) hgu133a2prosite has mapped keys (of keys) hgu133a2refseq has mapped keys (of keys) hgu133a2symbol has mapped keys (of keys) hgu133a2unigene has mapped keys (of keys) hgfocus platform: Quality control information for hgfocus: This package has the following mappings: hgfocusaccnum has 8793 mapped keys (of 8793 keys) hgfocusalias2probe has mapped keys (of keys) hgfocuschr has 8675 mapped keys (of 8793 keys) hgfocuschrlengths has 25 mapped keys (of 25 keys) hgfocuschrloc has 8588 mapped keys (of 8793 keys) hgfocusensembl has 8133 mapped keys (of 8793 keys) hgfocusensembl2probe has 7990 mapped keys (of 7990 keys) hgfocusentrezid has 8679 mapped keys (of 8793 keys) hgfocusenzyme has 1477 mapped keys (of 8793 keys) hgfocusenzyme2probe has 725 mapped keys (of 725 keys) hgfocusgenename has 8679 mapped keys (of 8793 keys) hgfocusgo has 8490 mapped keys (of 8793 keys) hgfocusgo2allprobes has 8539 mapped keys (of 8539 keys) hgfocusgo2probe has 6009 mapped keys (of 6009 keys) hgfocusmap has 8663 mapped keys (of 8793 keys) hgfocusomim has 7875 mapped keys (of 8793 keys) hgfocuspath has 3239 mapped keys (of 8793 keys) hgfocuspath2probe has 199 mapped keys (of 199 keys) hgfocuspfam has 8666 mapped keys (of 8793 keys) hgfocuspmid has 8664 mapped keys (of 8793 keys) hgfocuspmid2probe has mapped keys (of keys) hgfocusprosite has 8666 mapped keys (of 8793 keys) hgfocusrefseq has 8664 mapped keys (of 8793 keys) hgfocussymbol has 8679 mapped keys (of 8793 keys) hgfocusunigene has 8663 mapped keys (of 8793 keys) hgu133a platform: Quality control information for hgu133a: This package has the following mappings: hgu133aaccnum has mapped keys (of keys) hgu133aalias2probe has mapped keys (of keys) hgu133achr has mapped keys (of keys) hgu133achrlengths has 25 mapped keys (of 25 keys) hgu133achrloc has mapped keys (of keys) hgu133aensembl has mapped keys (of keys) hgu133aensembl2probe has mapped keys (of keys) hgu133aentrezid has mapped keys (of keys) hgu133aenzyme has 2896 mapped keys (of keys) hgu133aenzyme2probe has 789 mapped keys (of 789 keys) hgu133agenename has mapped keys (of keys) hgu133ago has mapped keys (of keys) hgu133ago2allprobes has 8921 mapped keys (of 8921 keys) hgu133ago2probe has 6415 mapped keys (of 6415 keys) hgu133amap has mapped keys (of keys) hgu133aomim has mapped keys (of keys) hgu133apath has 6498 mapped keys (of keys) hgu133apath2probe has 199 mapped keys (of 199 keys) hgu133apfam has mapped keys (of keys) hgu133apmid has mapped keys (of keys) hgu133apmid2probe has mapped keys (of keys) hgu133aprosite has mapped keys (of keys) hgu133arefseq has mapped keys (of keys) hgu133asymbol has mapped keys (of keys) hgu133aunigene has mapped keys (of keys) hgu95av2 platform: Quality control information for hgu95av2: This package has the following mappings: hgu95av2accnum has mapped keys (of keys) hgu95av2alias2probe has mapped keys (of keys) hgu95av2chr has mapped keys (of keys) hgu95av2chrlengths has 25 mapped keys (of 25 keys) hgu95av2chrloc has mapped keys (of keys) hgu95av2ensembl has mapped keys (of keys) hgu95av2ensembl2probe has 8286 mapped keys (of 8286 keys) hgu95av2entrezid has mapped keys (of keys) hgu95av2enzyme has 1957 mapped keys (of keys) hgu95av2enzyme2probe has 709 mapped keys (of 709 keys) hgu95av2genename has mapped keys (of keys) hgu95av2go has mapped keys (of keys) hgu95av2go2allprobes has 8383 mapped keys (of 8383 keys) hgu95av2go2probe has 5898 mapped keys (of 5898 keys) hgu95av2map has mapped keys (of keys) hgu95av2omim has mapped keys (of keys) hgu95av2path has 4415 mapped keys (of keys) hgu95av2path2probe has 199 mapped keys (of 199 keys) hgu95av2pfam has mapped keys (of keys) hgu95av2pmid has mapped keys (of keys) hgu95av2pmid2probe has mapped keys (of keys) hgu95av2prosite has mapped keys (of keys) hgu95av2refseq has mapped keys (of keys) hgu95av2symbol has mapped keys (of keys) hgu95av2unigene has mapped keys (of keys) For all platforms annotation was retrieved using the following database releases: Additional Information about packages: DB schema: HUMANCHIP_DB DB schema version: 1.0 Organism: Homo sapiens Date for NCBI data: 2008-Apr2 Date for GO data: Date for KEGG data: 2008-Apr1 Date for Golden Path data: 2006-Apr14 Date for IPI data: 2008-Mar19 Date for Ensembl data: 2007-Oct24 7

8 This information includes mapping between the following entities: ACCNUM: microarray identifiers to GenBank Accession Numbers; ALIAS2PROBE: microarray identifiers to alternative Gene Symbol ; CHRLOC: microarray identifiers to Chromosomal Location; CHRLENGTHS: length of each of the Chromosomes; CHR: microarray identifiers to Chromosomes; ENSEMBL2PROBE: microarray identifiers to Ensembl gene accession numbers; ENSEMBL: microarray identifiers to Ensembl gene accession numbers; ENTREZID: microarray identifiers to Entrez Gene; ENZYME: microarray identifiers to Enzyme Commission (EC) Numbers; GENENAME: microarray identifiers to Gene names; GO: microarray identifiers to Gene Ontology (GO); MAP: microarray identifiers and cytogenetic maps/bands; OMIM: microarray identifiers to Mendelian Inheritance in Man (MIM) identifiers; PATH: microarray identifiers to KEGG pathway identifiers; PMID: microarray identifiers to PubMed identifiers; REFSEQ: microarray identifiers to RefSeq identifiers SUMFUNC: microarray identifiers to Gene Function Summaries; SYMBOL: microarray identifiers to Gene Symbols; UNIGENE: microarray identifiers to UniGene cluster identifiers; CHRLENGTHS: A named vector for the length of each of the chromosomes; ENZYME2PROBE: Enzyme Commission Numbers to microarray identifiers; GO2ALLPROBES: Gene Ontology (GO) identifiers to all microarray identifiers; GO2PROBE: Gene Ontology (GO) identifiers to microarray identifiers; ORGANISM: the Organism for each specific microarray platform; PATH2PROBE: KEGG pathway identifiers to microarray identifiers; PFAM: microarray identifiers to Pfam identifiers; PMID2PROBE: PubMed identifiers to microarray identifiers PROSITE: microarray identifiers to PROSITE identifiers; 1.3 Experimental protocols The integrity of the RNA was determined with the RIN software algorithm [13]. Only RNA samples with a RIN score of > 7.5 were analyzed. Microarray analysis was performed using the Affymetrix gene chip platform (Affymetrix, Santa Clara, CA). Probe synthesis, hybridization, washing, staining and scanning of the gene chips was performed according to the manufacturers protocols. In brief, double-stranded cdna synthesis was prepared from 2.5 to 5 ug of total RNA (SuperScript double-stranded cdna synthesis kit; Invitrogen). After purification (QIAquick, Qiagen AG), the cdna was transcribed in vitro in the presence of biotinylated ribonucleotides (Bioarray High yield T7 DNA transcription kit, ENZO Life Sciences). The labeled crna was purified on an affinity resin (RNeasy, Qiagen AG), quantified and fragmented ( nucleotides in length). 8

9 Following hybridization at 45 degrees Celsius for approximately 16 hours, the gene chips were washed and stained with streptavidin-phycoerythrin (Molecular Probes) using the GeneChip Fluidics Workstation 450 (Affymetrix). The gene chips were scanned using a confocal laser scanner (GeneArray Scanner 3000). The scanned image was converted into numerical values of the signal intensity (Signal) and results were saved in the CEL files used to perform the analysis. 1.4 Pre-processing Raw data were obtained for all hybridization considered (Affymetrix CEL files). All pre-processing procedures described below were performed using functions and methods implemented in the package affy [14] available though the R/Bioconductor project [11, 12]. We pre-processed each dataset separately, according to the type of platform involved. Pre-processing appropriateness was monitored using standard diagnostic plots (i.e. 2-D image plots, RNA degradation plots, MA-plots, and box plots). Affymetrix data were normalized at probe-level by fitting the RMA empirical stochastic model described by Irizarry [15]. Standardization across DNA-chips was attained by quantile normalization [16]. To pre-process the gene expression measurements in the combined analysis with all data sets descibed in Table 1, we used the gene expression empirical distribution described by Zilliox and Irizarry and implemented in the expression barcode analytical method [17]. The original method was implemented to accurately detect genes that are expressed or not, and thus to define a gene expression bar code for as many different tissue types as possible. At the purpose the authors used a vast amount of publicly available data sets to obtain the empirical distribution for the expression of each probe set of the hgu133a array across all the different tissue types considered. The authors demonstrated that this empirical distribution is effective in removing the lab and batch effects, and for this reason was applied in this study [17]. In this study we normalized gene expression data using RMA and such reference distribution, without dichotomizing the normalized expression values (frozen-rma approach, manuscript under preparation). To this end we introduced a preliminary step to map each probe from each Affymetrix platform considered in this study to the hgu133a array. The empirical distribution used in the barcode method was then used for normalization purposes (frozen-rma approach, manuscript in preparation) after this mapping was accomplished (see Figures 16, 17, and 18 in the following Result section to see the comparison of the data with and without the use of the empirical distribution). Cross-referencing was performed at the sequence level for all individual probes available separately for each platforms pairs (hg133plus2 to hgu133a, hgu95av2 to hgu133a, hgfocus to hgu133a, hgu133a2 to hgu133a) keeping only probes with an exact match. After frozen-rma normalization, each data set was standardized separately using the quantile method and then merged with all the other data sets. The quantile normalization was further applied to the merged data set [16]. Due to the reduced number of identical probes that are shared across all Affymetrix platforms considered, only 3181 probe sets were available in the final merged data set obtained with the frozen-rma method, and all analyses involving this combined combined data set were performed on this subset of genes. However, we also repeated all these analyses, obtaining similar results, using a larger combined data set (containing 6720 genes) produced by merging the individual expression matrices nornalized by RMA (withouth the barode method) and by using the Entrez Gene identifiers as the basis for cross-referencing the array features (data not shown). 9

10 1.5 Differential gene expression In all data sets considered in the present study differential gene expression was investigated using functions and methods implemented in the R/Bioconductor [11, 18] package limma [2, 3]. Briefly, a fixed effects linear model was fit for each individual feature to estimate expression differences between groups of samples to be compared. When technical replicates or matched samples from the same individual were available, correlation coefficients were computed between replicates and the associated consensus correlation was added to the model [4]. An empirical Bayes approach was applied to moderate standard errors of M-values [19, 3]. Finally, for each analyzed feature moderated t-statistics, log-odds ratios of differential expression (B-statistics), raw and adjusted p-values (FDR control by the Benjamini and Hochberg method [20]) were obtained. 1.6 Analysis of Functional Annotation Analysis of Functional Annotation (AFA) was performed to accomplish the following goals: Capture biological processes relevant in the investigated contrasts; Compare pathways and biological themes associated with the gene expression programs associated with the different SCLC laboratory models considered; To perform the AFA we used the Wilcoxon rank-sum test to search for and compare functional and biological concepts (Functional Gene Sets, FGS hereafter) associated with the considered phenotypes and comparisons we investigated. The Wilcoxon rank-sum test computes a p-value to test the hypothesis that a FGS, defined by a functional annotation, tends to be more highly ranked in an ordered list. In the present study individual genes on the arrays were ranked by their absolute moderate t-statistics and statistical tests were performed for all the contrasts considered in the linear models described above. This analysis was done using two distinct reference populations: All the non-redundant genes (according to Entrez Gene identifiers) present on the microarray platform; All the non-redundant genes annotated in each specific functional theme (i.e., KEGG); After statistical tests were performed, control of false discovery rate (correction for multiple hypothesis testing) was obtained by applying the Benjamini and Hochberg method [20], as implemented in the multtest R/Bioconductor package. The following functional annotation themes were used to define gene sets fed into the analysis: Gene Ontology Terms (GO) [21, 22, 23]; KEGG PATHWAY sets [24]; Functional themes from MSigDb [25] (Molecular Signature Database, please refer to: http: // Mappings between individual features of each microarray platform used to the various functional themes considered were based on NCBI Entrez Gene, as obtained from the corresponding R- Bioconductor metadata packages (see Annotation in the Methods section). 1.7 Use of heatmaps Analysis of Functional Annotation (AFA) results obtained from different contrasts were compared and displayed using heatmaps, in which the level of significance (P-values from the Wilcoxon rank 10

11 sum tests) was represented in a suitable color scale. This was accomplished through the following analysis steps: 1. Selection of FGS significantly enriched in at least one of the individual contrasts considered. In our analysis filtering was performed using the Benjamini-Hochberg corrected p-values from the Wilcoxon rank sum tests, by selecting FGS showing an adjusted p-value below 0.1 (unless differently specified in the text) in at least one of the contrasts to be compared. 2. Merging of selected FGS raw P-value into a unique matrix, where the columns represent the individual contrasts considered, and the rows represent the selected FGS; 3. Logarithmic transformation (base 10) of the corresponding raw p-values and heatmap generation, with or without hierarchical clustering of columns and rows. Color intensities in the heatmaps corresponds to the absolute values of the base 10 logarithms of the raw p-values. This exploratory approach intuitively allowed us to detect commonalities and peculiarities among the different biological contrasts we evaluated. Heatmaps were also used for displaying the correlation among individual samples and groups considered in our analysis (see below for details). 1.8 Concordance-at-the-top plots The Concordance-at-the-top plot (cat-plot) [26] was developed to asses the agreement between microarray results from different contrasts considered. In particular, this technique enables comparing the correspondence of two lists ranked by a predefined statistics at their top. This is accomplished as follows: 1. Ordering of the two lists according to a suitable statistics ( i.e. differential gene expression, significance, probability,...); 2. Computing the proportion of elements in common for a given list size; 3. Reiterating the two steps above increasing the list size up to all common elements; 4. Plotting the proportion of common elements against the increasing size of the the considered lists; Since cat-plots evaluate such agreement at the top of ranked lists they are particularly useful in gene expression analysis, where only a small fraction of genes is expected to be differentially expressed over the large total number of analyzed genes. We used cat-plots to evaluate the agreement of gene expression signatures across the comparisons considered in the present study (i.e. among the XG-to-CL and the XG-to-CLX contrasts,...). For comparisons at the gene level, we ranked each gene list using the moderate t-statistics, the fold change, and the median gene expression by sample group, resulting from our linear model analysis. Common elements among list were assessed by cross-referencing array features across platforms by their corresponding Entrez Gene identifiers. In this study we ordered the Entrez Gene identifiers and performed the cat-plots as follows: By increasing ordering using the signed moderate t-statistics, and the signed fold-change, to investigate down-regulated genes separately from up-regulated ones; By increasing ordering using the inverse signed moderate t-statistics, and the inverse foldchange, to investigate up-regulated genes separately from down-regulated ones; 11

12 1.9 Correlation among groups and individual samples We computed correlations between individual samples and between groups to assess the overall similarity between the different type of specimens considered in our analysis. When comparing groups we computed the correlations using the moderated t-statistics, the group fold-change, and the group median gene expression after pre-processing and normalization. When comparing individual samples we used gene expression measurements after pre-processing and normalization. Correlations were computed using all available unique genes (based on Entrez Gene identifiers), as well as by using all genes differentially expressed in any comparison among the SCLC laboratory models from the triplets (XG, CL, CLX). We used heatmap also to show the clustering of the squared pair-wise correlations, using Euclidian distance and the average clustering methods, using a color-scale from white to red for increasing correlation. We used classical multidimensional scaling (principal coordinates or components analysis [27]) to display in a multidimensional space the distance between the groups and the individual samples considered in our analysis. The distance was expressed as 1-Pearson s correlation. All pair-wise comparisons between the considered groups and individual samples (XG, CL, CLX, and the primary SCLC) were computed using all the genes and those that proved differentially expressed in any contrasts among our laboratory models (adjusted P-value < 0.05 or lower, as specified), as obtained from our linear model analysis. Similar results were obtained by mapping the genes from our de novo analysis to the combined data set, on by selecting the differentially expressed genes in the combined data set Software All analyses were performed using analytical packages from the R/Bioconductor project [11, 12], including limma [4], affy [28], and multtest. Addtional functions and methods developed by Dr. Marchionni are implemented in the morefgs and funcbox packages, which were already applied in to compare gene expression data in the prostate [29], and are available at jhmi.edu/~marchion/packages.html. 12

13 2 Results 2.1 De Novo data set pre-processing. We performed De Novo gene expression analysis on the hgu133plus2 platform using samples from SCLC models, normal lung and the universal reference RNA (Stratagene). This data set is denoted by the letter A in Table 1. We applied pre-processing and normalization methods implemented in the package affy [14]. Below are shown the diagnostic plots performed to monitor pre-processing procedures: image plots and RNA degradation plots, MA plots, and boxplots. Figure 1: Panel A: 2-D image plots for the all hybridizations in data set A (see Table 1). One hybridization showed an hybridization artifact and was repeated. Panel B: RNA degradation plot reporting the slope for each individual sample in data set A, the slope distribution and the relative boxplot. Panel C: Boxplot for the data set A (see Table 1) after RMA normalization. Overall these plots showed that all the samples analyzed are similar. Although one of the hybridizations showed an artifact on the image and was repeated (Figure 1, Panel A), overall all samples appeared of similar quality in the RNA degradation plots (Figure 1, Panel B), and could be effectively normalized, as evident in the individual MA plots before and after normalization (see Figures 2, 3), and in the corresponding boxplots (Panel C in Figure 1). M- value medians for each array, reported in the legend of each individual panel, confirm the successful normalization, as well as the individual distributions for gene expression intensities after RMA normalization shown for every hybridization of data set A in the boxplots (Panel C in Figure 1). 13

14 Figure 2: MA plots for data set A (see Table 1) before RMA normalization. 14

15 Figure 3: MA plots for data set A (see Table 1) after RMA normalization. 15

16 2.2 Differentially expressed genes Direct Comparison Below are reported the findings from our linear model analysis in the de novo data set A (see Table 1), comparing XG, CL, and CLX from pairs and triplets of samples. In the direct comparison XG, CL and CLX from triplets of samples were compared directly to one another to identify the genes that are differentially expressed. In Figure 4 are shown the volcano plots for the direct comparisons XG-to-CL, XG-to-CLX, and CL-to-CLX: at the same level of probability of differential expression a smaller number of differentially expressed genes is recovered when CL are compared to the derivative CLX, than when XG is compard to CL or CLX. The same findings are reported in the Venn diagrams in Figure 5, for different levels of stringency. Indirect Comparison Below are reported the results from our linear model analysis in the de novo data set A (see Table 1), comparing XG, CL, and CLX from pairs and triplets of samples. In this indirect comparison the XG, CL and CLX groups from matched triplets were compared directly to the human universal reference RNA. The computed log2 fold-changes were then used to compared the laboratory models to one another. In Figure 6 are shown the number of differentially expressed genes at various levels of signicance. Overall the volcano plots, the Venn diagrams, from both the direct and the indirect comparison, reveal that the levels of gene expression of CL and CLX are highly correlated. Figure 4: Volcano plots of direct contrasts among XG, CL, and CLX. Differential expression (as log2 fold-change) is on the X-axis, and probability of differential expression (B-statistics) on the Y-axis. 16

17 Figure 5: Venn Diagrams of the significant differentially expressed genes in direct comparisons among XG, CL, and CLX. Increasing Adjusted P value are shown by column, from left to right < 0.001, < on the Figure 6: Venn Diagrams of the significant differentially expressed genes in indirect comparisons among XG, CL, and CLX, using the human reference RNA to measure the fold-change. Increasing Adjusted P value are shown by column, from left to right < 0.001, < on the 17

18 Figure 7: Panel A: Squared-correlation matrix showing the relationships between the group of sample that were compared. Matched triplets of samples (XG, CL, CLX) are denoted by the prefix w, while pairs by the prefix p (CL and CLX only). Gene expression relative to the human reference RNA (Stratagene), expressed as log2 fold-change, was used to compare the groups. and to compute the squared correlation reported for each comparison. Panel B: Squared-correlation matrix showing the relationships between the group of sample that were compared, including the reference RNA group and the normal lung specimen that was profiled. Median normalized intensities as obtained after RMA pre-processing are used to compute the correlations. For the only normal lung sample profiled normalized individual expression values were used. Panel C: MDS scaling plot showing the relationship among matched triplets and pair of samples, using the first two components and 1-correlation of the group median intensities as the distance. Matched triplets of samples (XG, CL, CLX) are denoted by the prefix w, while pairs by the prefix p (CL and CLX only). Comparing pairs and triplets Below are reported the results for the comparisons among matched pairs (pcl and pclx) and matched triplets (wxg, wcl, and wclx) of laboratory models, as they were obtained from our linear model analysis in the de novo data set A (see Table 1). This analysis revealed that the primary xenografts group (xxg) is similarly correlated to its matched cell lines and secondary xenografts (wcl and wclx), as it is to the unmatched groups from the pairs (pcl and pclx). Similarly the cell line groups (wcl and pcl) and the secondary xenografts groups (wclx and pclx) are more close to one another than they are to the primary xenografts (wxg). These findings are evident both by using the log2 fold-change with respect to the reference RNA (see Panel A in Figure 7), or the median gene expression intensity for each group (see Panel B and C in Figure 7). In Figures 9 and 9 is shown the 45 degrees rotation of the scatter plots (conceptually similar to an MA plot) comparing the log2 fold-change or the group median intesity across triplets and pairs of samples. 18

19 Figure 8: MA plots are used to show the correlation among XG, CL, and CLX obtained from matched triplets and pairs of samples. Matched triplets of samples (XG, CL, CLX) are denoted by the prefix w, while pairs by the prefix p (CL and CLX only). Gene expression relative to the human reference RNA (Stratagene), expressed as log2 fold-change, was used to compare the groups. and to compute the squared correlation reported for each comparison. These plots are the 45 degrees of the log2 fold-change scatter plots. 19

20 Figure 9: MA plots are used to show the correlation among XG, CL, and CLX obtained from matched triplets and pairs of samples. Matched triplets of samples (XG, CL, CLX) are denoted by the prefix w, while pairs by the prefix p (CL and CLX only). In this figures the squared-correlations and the MA-plots are obtained using the normalized median intensities for each group, rather than the fold-change relative to the human reference RNA (Stratagene). In this figure are shown also the comparison with the normal lung sample and the human reference RNA group. These plots are the 45 degrees rotation of the group median intensities scatter plots. 20

21 2.3 Concordance at the top plots Concordance at the top (CAT) plots were used as an alternative to correlation to compare groups of samples. Unlike correlation, this technique is not influenced by the to the bulk of genes that are not differerentially expressed among distinct conditions under comparison, since it focuses on the top and on the bottom of the ranked list of differentially expressed genes. We used CAT plots to compared all groups in our analysis, using differential gene expression as it was obtained from our linear model analysis in data set A (see Table 1). We compared up and down regulated genes separately, using both the fold-change and the t-statistics to rank the genes. We repeated this analysis using both all features on the array (Affymetrix probe sets) and the unique genes only. Overall this analysis confirmed that CL and CLX are very closed in terms of differential expression, and that unrelated cell line (xcl and pcl) are closer to one another, than primary xenografts (wxg) to its derivative cell lines (wcl) (see Figures 10,11, and 12) Figure 10: Concordance at the top plot showing the relationship among matched triplets and pairs of samples. On Left the individual Affymetrix probe sets are ranked in decreasing order by the group log2-transformed fold-change to the human reference RNA group (Stratagene), therefore showing the concordance among groups based on the genes that are dw-regulated with respect to the human reference RNA. On Right the individual Affymetrix probe sets are ranked in decreasing order by the group log2-transformed fold-change to the human reference RNA group, therefore showing the concordance among groups based on the genes that are up-regulated with respect to the human reference RNA. Matched triplets of samples (XG, CL, CLX) are denoted by the prefix w, while pairs by the prefix p (CL and CLX only). Each line represent a pair-wise comparison of two group and the color code is shown in the Figure legend. The two comparisons between CL and CLX, from both matched triplets or pairs of samples, show the greatest agreement (blue and red lines). 21

22 Figure 11: Concordance at the top plot showing the relationship among matched triplets and pairs of samples. On Left the individual Affymetrix probe sets are ranked in decreasing order by the group moderated t-statistics to the human reference RNA group (Stratagene), therefore showing the concordance among groups based on the genes that are dw-regulated with respect to the human reference RNA. On Right the individual Affymetrix probe sets are ranked in decreasing order by the group moderated t-statistics to the human reference RNA group, therefore showing the concordance among groups based on the genes that are up-regulated with respect to the human reference RNA. Matched triplets of samples (XG, CL, CLX) are denoted by the prefix w, while pairs by the prefix p (CL and CLX only). Each line represent a pair-wise comparison of two group and the color code is shown in the Figure legend. The two comparisons between CL and CLX, from both matched triplets or pairs of samples, show the greatest agreement (blue and red lines). Figure 12: Concordance at the top plot showing the relationship among matched triplets and pairs of samples. On Left non-redundant individual Affymetrix probe sets are ranked in decreasing order by the group moderated t-statistics to the human reference RNA group (Stratagene), therefore showing the concordance among groups based on the genes that are dw-regulated with respect to the human reference RNA. On Right the individual Affymetrix probe sets are ranked in decreasing order by the group moderated t-statistics to the human reference RNA group, therefore showing the concordance among groups based on the genes that are up-regulated with respect to the human reference RNA. Matched triplets of samples (XG, CL, CLX) are denoted by the prefix w, while pairs by the prefix p (CL and CLX only). Each line represent a pair-wise comparison of two group and the color code is shown in the Figure legend. The two comparisons between CL and CLX, from both matched triplets or pairs of samples, show the greatest agreement (blue and red lines). 22

23 2.4 Analysis of Functional Annotation AFA consistency. Several functional themes have been explored by AFA in the present study. The analysis was carried on considering two distinct reference populations for each of the considered theme: All the non redundant genes present on the array that had an EGID; Only the non redundant genes present on the array that had an EGID and that were annotated to each specific functional theme; For each type of funtional ontology the results obtained by using all the non redundant genes as reference population in the wilcoxon rank sum tests, were compared to those obtained by using only the non redundant genes annotated to the considered ontology for each considered array. This comparison allowed to focus only enriched sets that were stable in the two type of analysis, that is the gene sets taht were stable irrespective to the used reference population. It must be underlined that such different reference populations in certain cases show great size difference from one another. This approach allowed us to use p values obtained from wilcoxon rank sum test obtained on the same set of genes, avoiding to call significant gene sets enriched only in one of the two analyses. The filtering was obtained by comparing the ranks obtained for each ontologies using the two mentioned approaches. Plots of the ranks comparisons is reported below. Filtering was obtained by excluding gene sets that showed an absolute rank-difference below the 1st rank difference percentile and above the 99th percentile. Distribution of such quantity are reported in Figure 13. AFA: Heatmaps Raw P-values from the Wilcoxon rank sum tests were used to prepared heatmaps showing at a glimpse functional themes across the different comparisons evaluated in the present study. This exploratory approach intuitively allowed us to detect commonalities and peculiarities among the different biological contrasts we evaluated. Color intensities in the heatmaps correspond to the absolute values of the base 10 logarithms of the raw P values. To limit the number of categories in each heatmap, we used only the gene sets showing a BH corrected P value below 0.05 in at least one of the comparisons (or the first 50 categories if more than 50 categories showed a BH corrected P value below 0.05). In Figure 14 is the comparative AFA results for GO, while in in Figure 15 are the results for the curted pathways from the Molecular Signature data base. 23

24 Figure 13: Comparisons of the rank obtained for gene sets in AFA analysis using all the non redundant genes as the reference population or the non redundant genes annotated to each functional annotation source (GO, KEGG and gene sets from the Molecular Signature data base (MsigC2v2: curated molecular pathways). In the smoothened scatter plots the Y axis reports ranks for all the unique genes as reference population, while the X axis display ranks for the annotated reference populations. Ranks were based on the raw P values, as obtained in the Wilcoxon rank sum test. Unstable categories are evident as outliers. The corresponding rank difference distributions are shown below the scatter plots. In Panel A are shown the consistency results for the direct comparison between matched triplets of XG and CL; In Panel B are shown the consistency results for the direct comparison between matched triplets of XG and CLX; In Panel C are shown the consistency results for the direct comparison between matched triplets of CL and CLX; 24

25 Figure 14: Heatmaps displaying gene set enrichment across the different comparison considered for Gene Ontology. Color-coded values correspond to absolute values of base 10 logarithms of raw P values from the Wilcoxon rank sum test, as obtained using all the non redundant genes. Rows were clustered using the Euclidian distance, with the average clustering method; columns were not reordered. 25

26 Figure 15: Heatmaps displaying gene set enrichment across the different comparison considered for Gene Ontology. Color-coded values correspond to absolute values of base 10 logarithms of raw P values from the Wilcoxon rank sum test, as obtained using all the non redundant genes. Rows were clustered using the Euclidian distance, with the average clustering method; columns were not reordered. 26

27 2.5 Comparison to primary SCLC Combined data set pre-processing In the plots below is shown the results of the pre-processing with and without the use of the empirical distribution used in the barcode approach [17]. In both cases the RMA algorithm was used. Overall the plots show that the frozen-rma method is an efficient approach to remove systematic differences due to the lab and the batch effects. Boxplots are shown in Figure 16. Panels on the left (Panel A) correspond to data sets merged using the standard RMA algorithm, while the panel on the right correspond to data sets merged after the frozen-rma method was applied (see Supplementary Methods section fro details and the original manuscript by Zilliox and Irizarry [17]). The difference between the two approaches is mostrly evident in the top panels, where data are shown after merging and prior any standardization procedure: when the frozen-rma method is used most of the differences due to hybridization batches (in this case corresponding also to a different platform) are removed. In the center and bottom panels in Figure 16) are reported the boxplots after standardization using the scale or the quantile methods [16, 30]. In our analysis the combined data set obtained using the frozen-rma method followed by quantile normalization was used in further steps of the analysis. Figure 16: Panel A: Boxplots for the merged data sets (see Table 1) after RMA normalization without the use of the empirical distribution used in the barcode approach [17]. The merged data show a strong data set effect (top panel, yellow box plots). The center and bottom panels show the standardized merged data using the scale (center) or the quantile (bottom) methods. Panel B:Boxplots for the merged data sets (see Table 1) after RMA normalization with the use of the barcode empirical distribution [17]. The merged data do not show a strong data set effect (top panel, yellow box plots), if compared to data obtained without the use of the frozen-rma method (see Panel A). The center and bottom panels show the standardized merged data using the scale (center) or the quantile (bottom) methods. 27

28 Figure 17: Panel A: Correlation matrix among the 192 samples of the combined analysis (see Table 1) using gene expression data obtained after RMA normalization without the use of the empirical distribution used in the barcode approach [17]. Samples from the same data set show a higher correlation and are segregated into groups, revealing a clear data set effect. Colors at the top refer to the 5 different platforms used, while the ones on the left refers to the 13 different data sets. The correlations among samples span from 0 to 1. Panel A: Correlation matrix among the 192 samples of the combined analysis (see Table 1) using gene expression data obtained after RMA normalization with the use of the frozen-rma empirical distribution [17]. The data set effect is partially removed if compared to the merged data obtained without using frozen-rma The correlations among samples span from 0.28 to 1. (see Table 2). 28

29 Figure 18: Density of the samples correlations as computed using expression normalized with or without the frozen- RMA method. These density correspond to the correlations displayed in Figures 17 Panel A and B and a summary of them is reported in Table 2. In orange (continuous line) is the density distribution for correlations without the use of frozen-rma, in blue (dotted line) is the distribution of the correlations among samples with the use of the empirical distribution used in the barcode method. Note the shift to the right of the distribution for the frozen-rma approach. WithoutBarcode WithBarcode Min st Qu Median Mean rd Qu Max Table 2: Table summarizing the density distribution of the samples correlations as computed using expression normalized with or without the frozen-rma approach. The density data correspond to the correlations displayed in Figure 18. The first column is for the density distribution for correlations without the use of frozen-rma, the second column is for the distribution of the correlations among samples with the use of the frozen-rma method Comparison between models and primary tumors We used the combined data set described in Table 1 to compare the matched laboratory model (XG, CL and CLX) to a collection of SCLC primary tumors, and to a large collection of normal lung specimens. We search for a gene expression signature conserved between the primary SCLC and the models we profiled in our laboratory. Overall we found that a core gene list distinguishing SCLC from normal lung exist and it si conserved in the laboratory models. In Figure 19 are reported the number of genes that are differentially among these group of samples using the normal lung group as the reference. In Figure 19 Panel A, is shown the overlap between the genes that are differentially expressed between XG and the normal lung, and between the primary SCLC and the normal lung group. In Figure 19 Panel B, is shown the overlap between the genes that are differentially expressed between CL and the normal lung, and between the primary SCLC and the normal lung group. In Figure 19 Panel C, is shown the overlap between the genes that are differentially expressed between CLX and the normal lung, and between the primary SCLC and 29

30 the normal lung group. In Figure 19 Panel D, is shown the overlap between the Venn diagrams intersections in Figure 19 Panels A, B and C: 387 genes are the common genes that are differentially expressed when the samples from SCLC (either the primary or the model groups) are compared to normal lung. In Figure 19 Panel E, is shown the overlap between the all the genes predicted to be different between the normal lung and the other groups(primary SCLC, XG, CL, and CLX). To detect the differences that are induced by in vitro culture we compared gene expression of the matched triplets (XG, CL and CLX) to primary SCLC using the genes that we identified as different by comparing the laboratory models to one another. We repeated this analysis by mapping the genes that were identified in the analysis of the data set A (see Table 1 using the Entrez Gene identifiers as the cross-referencing keys (Figure 20), and by predicting these genes from scratch in the combined data set, using different levels of stringency to select the list of differentially expressed genes (Figures 20,21,22). Overall, the XG group proved always more correlated to primary SCLC specimens than CL or CLX. Figure 19: Venn Diagrams of the significant differentially expressed genes in the comparsisons between the normal lung group and the XG, CL, CLX and primary SCLC groups. 30

31 Figure 20: Squared-correlation matrix showing the relationships between matched triplets of samples (XG, CL, CLX) and the primary SCLC group. On the left is shown the correlation between gene expression relative to the normal lung group, while on the left is shown the correlation between median gene expression intensity for each group. In this figure were used the genes that proved differentially expressed in any comparison between XG, CL and CLX at an adjusted p-value < 0.05, using the complete data set A (see Table 1) and that could be mapped to the comnbined data set usign the corresponding Entrez Gene identifiers. Figure 21: Squared-correlation matrix showing the relationships between matched triplets of samples (XG, CL, CLX) and the primary SCLC group. On the left is shown the correlation between gene expression relative to the normal lung group, while on the left is shown the correlation between median gene expression intensity for each group. In this figure were used the genes that proved differentially expressed in any comparison between XG, CL and CLX at an adjusted p-value < using the compbined data set. 31

32 Figure 22: Squared-correlation matrix showing the relationships between matched triplets of samples (XG, CL, CLX) and the primary SCLC group. On the left is shown the correlation between gene expression relative to the normal lung group, while on the left is shown the correlation between median gene expression intensity for each group. In this figure were used the genes that proved differentially expressed in any comparison between XG, CL and CLX at an adjusted p-value < 0.05 using the compbined data set. Figure 23: MA plots are used to show the correlation among XG, CL, and CLX obtained from matched triplets and pairs of samples, and between those and the primary SCLC group. Here are used the genes that were predicted in the combined data set to be different (adjusted p-value < 0.005) between any comparisons among the triplet model groups. MA plots are made using the median expression for each group, which is also used to compute the squared correlation reported for each comparison. These plots are the 45 degrees of the group medians scatter plots. 32

33 Figure 24: MA plots are used to show the correlation among XG, CL, and CLX obtained from matched triplets and pairs of samples, and between those and the primary SCLC group. Here are used the genes that were predicted in the combined data set to be different (adjusted p-value < 0.05) between any comparisons among the triplets model groups. MA plots are made using the median expression for each group, which is also used to compute the squared correlation reported for each comparison. These plots are the 45 degrees of the group medians scatter plots. 3 Literature Cited References [1] David L Wheeler, Tanya Barrett, Dennis A Benson, Stephen H Bryant, Kathi Canese, Vyacheslav Chetvernin, Deanna M Church, Michael DiCuccio, Ron Edgar, Scott Federhen, Lewis Y Geer, Yuri Kapustin, Oleg Khovayko, David Landsman, David J Lipman, Thomas L Madden, Donna R Maglott, James Ostell, Vadim Miller, Kim D Pruitt, Gregory D Schuler, Edwin Sequeira, Steven T Sherry, Karl Sirotkin, Alexandre Souvorov, Grigory Starchenko, Roman L Tatusov, Tatiana A Tatusova, Lukas Wagner, and Eugene Yaschenko. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res, 35(Database issue):d5 12, Jan [2] G. K. Smyth. Limma: linear models for microarray data. In R. Gentleman, R. V. Carey, S. Dudoit, R. Irizarry, and W. Huber, editors, Bioinformatics and Computational Biology Solutions using R and Bioconductor, pages Springer, New York, [3] G. K. Smyth. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(Article 3), [4] G. K. Smyth, J. Michaud, and H. S. Scott. Use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics, 21(9): , (Print) Evaluation Studies Journal Article Validation Studies. [5] D. L. Wheeler, T. Barrett, D. A. Benson, S. H. Bryant, K. Canese, D. M. Church, M. DiCuccio, 33

34 Figure 25: Correlation among individual samples of primary SCLC and XG, CL, and CLX, using the genes that proved to be differentially expressed (adjusted P-value <0.05) in any comparison between the triplet groups (XG versus CL, CL versus CLX, and XG versus CLX) using the combined data set. On the top is shown the hierarchical clustering of the correlation matrix, while on the bottom is the Multidimensional Scaling showing the first three components obtaining from Principal Component analysis using 1-Correlation as the distance between individual samples. In Cyan are labeled the primary SCLC, in Purple are the XG samples, in Blue the CLX ones, and in Green are the CL. 34

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

User Guide. Association analysis. Input

User Guide. Association analysis. Input User Guide TFEA.ChIP is a tool to estimate transcription factor enrichment in a set of differentially expressed genes using data from ChIP-Seq experiments performed in different tissues and conditions.

More information

Title: Human breast cancer associated fibroblasts exhibit subtype specific gene expression profiles

Title: Human breast cancer associated fibroblasts exhibit subtype specific gene expression profiles Author's response to reviews Title: Human breast cancer associated fibroblasts exhibit subtype specific gene expression profiles Authors: Julia Tchou (julia.tchou@uphs.upenn.edu) Andrew V Kossenkov (akossenkov@wistar.org)

More information

EXPression ANalyzer and DisplayER

EXPression ANalyzer and DisplayER EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh

More information

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies. 2014. Supplemental Digital Content 1. Appendix 1. External data-sets used for associating microrna expression with lung squamous cell

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis Shisong Ma 1,2*, Michael Snyder 3, and Savithramma P Dinesh-Kumar 2* 1 School of Life Sciences, University

More information

SUPPLEMENTARY APPENDIX

SUPPLEMENTARY APPENDIX SUPPLEMENTARY APPENDIX 1) Supplemental Figure 1. Histopathologic Characteristics of the Tumors in the Discovery Cohort 2) Supplemental Figure 2. Incorporation of Normal Epidermal Melanocytic Signature

More information

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review

More information

Expanded View Figures

Expanded View Figures EMO Molecular Medicine Proteomic map of squamous cell carcinomas Hanibal ohnenberger et al Expanded View Figures Figure EV1. Technical reproducibility. Pearson s correlation analysis of normalised SILC

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Digitizing the Proteomes From Big Tissue Biobanks

Digitizing the Proteomes From Big Tissue Biobanks Digitizing the Proteomes From Big Tissue Biobanks Analyzing 24 Proteomes Per Day by Microflow SWATH Acquisition and Spectronaut Pulsar Analysis Jan Muntel 1, Nick Morrice 2, Roland M. Bruderer 1, Lukas

More information

SUPPLEMENTARY FIGURES: Supplementary Figure 1

SUPPLEMENTARY FIGURES: Supplementary Figure 1 SUPPLEMENTARY FIGURES: Supplementary Figure 1 Supplementary Figure 1. Glioblastoma 5hmC quantified by paired BS and oxbs treated DNA hybridized to Infinium DNA methylation arrays. Workflow depicts analytic

More information

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of SUPPLEMENTAL MATERIALS AND METHODS Quantitative RT-PCR Quantitative RT-PCR analysis was performed using the Universal mircury LNA TM microrna PCR System (Exiqon), following the manufacturer s instructions.

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

Detecting gene signature activation in breast cancer in an absolute, single-patient manner

Detecting gene signature activation in breast cancer in an absolute, single-patient manner Paquet et al. Breast Cancer Research (2017) 19:32 DOI 10.1186/s13058-017-0824-7 RESEARCH ARTICLE Detecting gene signature activation in breast cancer in an absolute, single-patient manner E. R. Paquet

More information

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature10866 a b 1 2 3 4 5 6 7 Match No Match 1 2 3 4 5 6 7 Turcan et al. Supplementary Fig.1 Concepts mapping H3K27 targets in EF CBX8 targets in EF H3K27 targets in ES SUZ12 targets in ES

More information

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets. Supplementary Figure 1 Transcriptional program of the TE and MP CD8 + T cell subsets. (a) Comparison of gene expression of TE and MP CD8 + T cell subsets by microarray. Genes that are 1.5-fold upregulated

More information

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Kaifu Chen 1,2,3,4,5,10, Zhong Chen 6,10, Dayong Wu 6, Lili Zhang 7, Xueqiu Lin 1,2,8,

More information

SubLasso:a feature selection and classification R package with a. fixed feature subset

SubLasso:a feature selection and classification R package with a. fixed feature subset SubLasso:a feature selection and classification R package with a fixed feature subset Youxi Luo,3,*, Qinghan Meng,2,*, Ruiquan Ge,2, Guoqin Mai, Jikui Liu, Fengfeng Zhou,#. Shenzhen Institutes of Advanced

More information

Package CLL. April 19, 2018

Package CLL. April 19, 2018 Type Package Title A Package for CLL Gene Expression Data Version 1.19.0 Author Elizabeth Whalen Package CLL April 19, 2018 Maintainer Robert Gentleman The CLL package contains the

More information

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the

More information

Cancer outlier differential gene expression detection

Cancer outlier differential gene expression detection Biostatistics (2007), 8, 3, pp. 566 575 doi:10.1093/biostatistics/kxl029 Advance Access publication on October 4, 2006 Cancer outlier differential gene expression detection BAOLIN WU Division of Biostatistics,

More information

SSM signature genes are highly expressed in residual scar tissues after preoperative radiotherapy of rectal cancer.

SSM signature genes are highly expressed in residual scar tissues after preoperative radiotherapy of rectal cancer. Supplementary Figure 1 SSM signature genes are highly expressed in residual scar tissues after preoperative radiotherapy of rectal cancer. Scatter plots comparing expression profiles of matched pretreatment

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1

Nature Neuroscience: doi: /nn Supplementary Figure 1 Supplementary Figure 1 Illustration of the working of network-based SVM to confidently predict a new (and now confirmed) ASD gene. Gene CTNND2 s brain network neighborhood that enabled its prediction by

More information

Integrated Analysis of Copy Number and Gene Expression

Integrated Analysis of Copy Number and Gene Expression Integrated Analysis of Copy Number and Gene Expression Nexus Copy Number provides user-friendly interface and functionalities to integrate copy number analysis with gene expression results for the purpose

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma. Supplementary Figure 1 Mutational signatures in BCC compared to melanoma. (a) The effect of transcription-coupled repair as a function of gene expression in BCC. Tumor type specific gene expression levels

More information

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis Tieliu Shi tlshi@bio.ecnu.edu.cn The Center for bioinformatics

More information

Supplementary Materials for

Supplementary Materials for www.sciencesignaling.org/cgi/content/full/6/278/rs11/dc1 Supplementary Materials for In Vivo Phosphoproteomics Analysis Reveals the Cardiac Targets of β-adrenergic Receptor Signaling Alicia Lundby,* Martin

More information

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining

More information

Supplementary Figure 1

Supplementary Figure 1 Supplementary Figure 1 Supplementary Fig. 1: Quality assessment of formalin-fixed paraffin-embedded (FFPE)-derived DNA and nuclei. (a) Multiplex PCR analysis of unrepaired and repaired bulk FFPE gdna from

More information

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022 96 APPENDIX B. Supporting Information for chapter 4 "changes in genome content generated via segregation of non-allelic homologs" Figure S1. Potential de novo CNV probes and sizes of apparently de novo

More information

IMPaLA tutorial.

IMPaLA tutorial. IMPaLA tutorial http://impala.molgen.mpg.de/ 1. Introduction IMPaLA is a web tool, developed for integrated pathway analysis of metabolomics data alongside gene expression or protein abundance data. It

More information

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies 2017 Contents Datasets... 2 Protein-protein interaction dataset... 2 Set of known PPIs... 3 Domain-domain interactions...

More information

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from Supplementary Figure 1 SEER data for male and female cancer incidence from 1975 2013. (a,b) Incidence rates of oral cavity and pharynx cancer (a) and leukemia (b) are plotted, grouped by males (blue),

More information

Sexually-dimorphic targeting of functionally-related genes in COPD

Sexually-dimorphic targeting of functionally-related genes in COPD BMC Systems Biology This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available soon. Sexually-dimorphic targeting

More information

CoINcIDE: A framework for discovery of patient subtypes across multiple datasets

CoINcIDE: A framework for discovery of patient subtypes across multiple datasets Planey and Gevaert Genome Medicine (2016) 8:27 DOI 10.1186/s13073-016-0281-4 METHOD CoINcIDE: A framework for discovery of patient subtypes across multiple datasets Catherine R. Planey and Olivier Gevaert

More information

Bioinformatics Laboratory Exercise

Bioinformatics Laboratory Exercise Bioinformatics Laboratory Exercise Biology is in the midst of the genomics revolution, the application of robotic technology to generate huge amounts of molecular biology data. Genomics has led to an explosion

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Fumagalli D, Venet D, Ignatiadis M, et al. RNA Sequencing to predict response to neoadjuvant anti-her2 therapy: a secondary analysis of the NeoALTTO randomized clinical trial.

More information

Supplementary Material. for. A cell-centered meta-analysis reveals baseline predictors of anti-tnf nonresponse. in biopsy and blood of IBD patients

Supplementary Material. for. A cell-centered meta-analysis reveals baseline predictors of anti-tnf nonresponse. in biopsy and blood of IBD patients Supplementary Material for A cell-centered meta-analysis reveals baseline predictors of anti-tnf nonresponse in biopsy and blood of IBD patients This document includes Supplementary Methods, Figures and

More information

CNV PCA Search Tutorial

CNV PCA Search Tutorial CNV PCA Search Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Data Preparation 2 A. Join Log Ratio Data with Phenotype Information.............................. 2 B. Activate only

More information

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded from TCGA and differentially expressed metabolic genes

More information

A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer

A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer 2922 Vol. 10, 2922 2927, May 1, 2004 Clinical Cancer Research Featured Article A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer Giovanni Parmigiani, 1,2,3

More information

Micro-RNA web tools. Introduction. UBio Training Courses. mirnas, target prediction, biology. Gonzalo

Micro-RNA web tools. Introduction. UBio Training Courses. mirnas, target prediction, biology. Gonzalo Micro-RNA web tools UBio Training Courses Gonzalo Gómez//ggomez@cnio.es Introduction mirnas, target prediction, biology Experimental data Network Filtering Pathway interpretation mirs-pathways network

More information

Statistical Assessment of the Global Regulatory Role of Histone. Acetylation in Saccharomyces cerevisiae. (Support Information)

Statistical Assessment of the Global Regulatory Role of Histone. Acetylation in Saccharomyces cerevisiae. (Support Information) Statistical Assessment of the Global Regulatory Role of Histone Acetylation in Saccharomyces cerevisiae (Support Information) Authors: Guo-Cheng Yuan, Ping Ma, Wenxuan Zhong and Jun S. Liu Linear Relationship

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Experimental design and workflow utilized to generate the WMG Protein Atlas.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Experimental design and workflow utilized to generate the WMG Protein Atlas. Supplementary Figure 1 Experimental design and workflow utilized to generate the WMG Protein Atlas. (a) Illustration of the plant organs and nodule infection time points analyzed. (b) Proteomic workflow

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

Simple, rapid, and reliable RNA sequencing

Simple, rapid, and reliable RNA sequencing Simple, rapid, and reliable RNA sequencing RNA sequencing applications RNA sequencing provides fundamental insights into how genomes are organized and regulated, giving us valuable information about the

More information

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. Supplementary Figure Legends Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. A. Screenshot of the UCSC genome browser from normalized RNAPII and RNA-seq ChIP-seq data

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:.38/nature8975 SUPPLEMENTAL TEXT Unique association of HOTAIR with patient outcome To determine whether the expression of other HOX lincrnas in addition to HOTAIR can predict patient outcome, we measured

More information

Supplementary Methods: IGFBP7 Drives Resistance to Epidermal Growth Factor Receptor Tyrosine Kinase Inhibition in Lung Cancer

Supplementary Methods: IGFBP7 Drives Resistance to Epidermal Growth Factor Receptor Tyrosine Kinase Inhibition in Lung Cancer S1 of S6 Supplementary Methods: IGFBP7 Drives Resistance to Epidermal Growth Factor Receptor Tyrosine Kinase Inhibition in Lung Cancer Shang-Gin Wu, Tzu-Hua Chang, Meng-Feng Tsai, Yi-Nan Liu, Chia-Lang

More information

Online Appendix Material and Methods: Pancreatic RNA isolation and quantitative real-time (q)rt-pcr. Mice were fasted overnight and killed 1 hour (h)

Online Appendix Material and Methods: Pancreatic RNA isolation and quantitative real-time (q)rt-pcr. Mice were fasted overnight and killed 1 hour (h) Online Appendix Material and Methods: Pancreatic RNA isolation and quantitative real-time (q)rt-pcr. Mice were fasted overnight and killed 1 hour (h) after feeding. A small slice (~5-1 mm 3 ) was taken

More information

Supplementary Materials for

Supplementary Materials for www.sciencesignaling.org/cgi/content/full/8/375/ra41/dc1 Supplementary Materials for Actin cytoskeletal remodeling with protrusion formation is essential for heart regeneration in Hippo-deficient mice

More information

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland

Expert-guided Visual Exploration (EVE) for patient stratification. Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland Expert-guided Visual Exploration (EVE) for patient stratification Hamid Bolouri, Lue-Ping Zhao, Eric C. Holland Oncoscape.sttrcancer.org Paul Lisa Ken Jenny Desert Eric The challenge Given - patient clinical

More information

R2 Training Courses. Release The R2 support team

R2 Training Courses. Release The R2 support team R2 Training Courses Release 2.0.2 The R2 support team Nov 08, 2018 Students Course 1 Student Course: Investigating Intra-tumor Heterogeneity 3 1.1 Introduction.............................................

More information

Figure S1. Analysis of endo-sirna targets in different microarray datasets. The

Figure S1. Analysis of endo-sirna targets in different microarray datasets. The Supplemental Figures: Figure S1. Analysis of endo-sirna targets in different microarray datasets. The percentage of each array dataset that were predicted endo-sirna targets according to the Ambros dataset

More information

chapter 1 - fig. 2 Mechanism of transcriptional control by ppar agonists.

chapter 1 - fig. 2 Mechanism of transcriptional control by ppar agonists. chapter 1 - fig. 1 The -omics subdisciplines. chapter 1 - fig. 2 Mechanism of transcriptional control by ppar agonists. 201 figures chapter 1 chapter 2 - fig. 1 Schematic overview of the different steps

More information

Supplementary Information Methods Subjects The study was comprised of 84 chronic pain patients with either chronic back pain (CBP) or osteoarthritis

Supplementary Information Methods Subjects The study was comprised of 84 chronic pain patients with either chronic back pain (CBP) or osteoarthritis Supplementary Information Methods Subjects The study was comprised of 84 chronic pain patients with either chronic back pain (CBP) or osteoarthritis (OA). All subjects provided informed consent to procedures

More information

UNIVERSITI TEKNOLOGI MARA COPY NUMBER VARIATIONS OF ORANG ASLI (NEGRITO) FROM PENINSULAR MALAYSIA

UNIVERSITI TEKNOLOGI MARA COPY NUMBER VARIATIONS OF ORANG ASLI (NEGRITO) FROM PENINSULAR MALAYSIA UNIVERSITI TEKNOLOGI MARA COPY NUMBER VARIATIONS OF ORANG ASLI (NEGRITO) FROM PENINSULAR MALAYSIA SITI SHUHADA MOKHTAR Thesis submitted in fulfillment of the requirements for the degree of Master of Science

More information

Supplemental Information. Systems Scale Interactive Exploration. Reveals Quantitative and Qualitative Differences

Supplemental Information. Systems Scale Interactive Exploration. Reveals Quantitative and Qualitative Differences Immunity, Volume 38 Supplemental Information Systems Scale Interactive Exploration Reveals Quantitative and Qualitative Differences in Response to Influenza and Pneumococcal Vaccines Gerlinde Obermoser,

More information

Package NarrowPeaks. August 3, Version Date Type Package

Package NarrowPeaks. August 3, Version Date Type Package Package NarrowPeaks August 3, 2013 Version 1.5.0 Date 2013-02-13 Type Package Title Analysis of Variation in ChIP-seq using Functional PCA Statistics Author Pedro Madrigal , with contributions

More information

Sexually-dimorphic targeting of functionally-related genes in COPD

Sexually-dimorphic targeting of functionally-related genes in COPD Sexually-dimorphic targeting of functionally-related genes in COPD The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Glass,

More information

Unsupervised Identification of Isotope-Labeled Peptides

Unsupervised Identification of Isotope-Labeled Peptides Unsupervised Identification of Isotope-Labeled Peptides Joshua E Goldford 13 and Igor GL Libourel 124 1 Biotechnology institute, University of Minnesota, Saint Paul, MN 55108 2 Department of Plant Biology,

More information

Journal: Nature Methods

Journal: Nature Methods Journal: Nature Methods Article Title: Network-based stratification of tumor mutations Corresponding Author: Trey Ideker Supplementary Item Supplementary Figure 1 Supplementary Figure 2 Supplementary Figure

More information

Metabolomic Data Analysis with MetaboAnalyst

Metabolomic Data Analysis with MetaboAnalyst Metabolomic Data Analysis with MetaboAnalyst User ID: guest6501 April 16, 2009 1 Data Processing and Normalization 1.1 Reading and Processing the Raw Data MetaboAnalyst accepts a variety of data types

More information

Phenotype analysis in humans using OMIM

Phenotype analysis in humans using OMIM Outline: 1) Introduction to OMIM 2) Phenotype similarity map 3) Exercises Phenotype analysis in humans using OMIM Rosario M. Piro Molecular Biotechnology Center University of Torino, Italy 1 MBC, Torino

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Cancer Informatics Lecture

Cancer Informatics Lecture Cancer Informatics Lecture Mayo-UIUC Computational Genomics Course June 22, 2018 Krishna Rani Kalari Ph.D. Associate Professor 2017 MFMER 3702274-1 Outline The Cancer Genome Atlas (TCGA) Genomic Data Commons

More information

Package leukemiaseset

Package leukemiaseset Package leukemiaseset August 14, 2018 Type Package Title Leukemia's microarray gene expression data (expressionset). Version 1.16.0 Date 2013-03-20 Author Sara Aibar, Celia Fontanillo and Javier De Las

More information

Author's response to reviews

Author's response to reviews Author's response to reviews Title: Specific Gene Expression Profiles and Unique Chromosomal Abnormalities are Associated with Regressing Tumors Among Infants with Dissiminated Neuroblastoma. Authors:

More information

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,

More information

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M.

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M. SUPPLEMENTARY DATA Gene expression profiling predicts clinical outcome of prostate cancer Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M. Hoffman, William L. Gerald Table of Contents

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary text Collectively, we were able to detect ~14,000 expressed genes with RPKM (reads per kilobase per million) > 1 or ~16,000 with RPKM > 0.1 in at least one cell type from oocyte to the morula

More information

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA www.impactjournals.com/oncotarget/ Oncotarget, Supplementary Materials 2016 Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) DNA Supplementary Materials

More information

Gene Expression Analysis Web Forum. Jonathan Gerstenhaber Field Application Specialist

Gene Expression Analysis Web Forum. Jonathan Gerstenhaber Field Application Specialist Gene Expression Analysis Web Forum Jonathan Gerstenhaber Field Application Specialist Our plan today: Import Preliminary Analysis Statistical Analysis Additional Analysis Downstream Analysis 2 Copyright

More information

Meaning-based guidance of attention in scenes as revealed by meaning maps

Meaning-based guidance of attention in scenes as revealed by meaning maps SUPPLEMENTARY INFORMATION Letters DOI: 1.138/s41562-17-28- In the format provided by the authors and unedited. -based guidance of attention in scenes as revealed by meaning maps John M. Henderson 1,2 *

More information

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock

More information

Huntington s Disease and its therapeutic target genes: A global functional profile based on the HD Research Crossroads database

Huntington s Disease and its therapeutic target genes: A global functional profile based on the HD Research Crossroads database Supplementary Analyses and Figures Huntington s Disease and its therapeutic target genes: A global functional profile based on the HD Research Crossroads database Ravi Kiran Reddy Kalathur, Miguel A. Hernández-Prieto

More information

Supporting Information

Supporting Information Supporting Information Retinal expression of small non-coding RNAs in a murine model of proliferative retinopathy Chi-Hsiu Liu 1, Zhongxiao Wang 1, Ye Sun 1, John Paul SanGiovanni 2, Jing Chen 1, * 1 Department

More information

New Enhancements: GWAS Workflows with SVS

New Enhancements: GWAS Workflows with SVS New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

More information

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, Supplementary Information Supplementary Figures Supplementary Figure 1. a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, gene ID and specifities are provided. Those highlighted

More information

Data mining with Ensembl Biomart. Stéphanie Le Gras

Data mining with Ensembl Biomart. Stéphanie Le Gras Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:

More information

Genetic Analysis of Anxiety Related Behaviors by Gene Chip and In situ Hybridization of the Hippocampus and Amygdala of C57BL/6J and AJ Mice Brains

Genetic Analysis of Anxiety Related Behaviors by Gene Chip and In situ Hybridization of the Hippocampus and Amygdala of C57BL/6J and AJ Mice Brains Genetic Analysis of Anxiety Related Behaviors by Gene Chip and In situ Hybridization of the Hippocampus and Amygdala of C57BL/6J and AJ Mice Brains INTRODUCTION To study the relationship between an animal's

More information

Supplementary Information

Supplementary Information Supplementary Information Modelling the Yeast Interactome Vuk Janjić, Roded Sharan 2 and Nataša Pržulj, Department of Computing, Imperial College London, London, United Kingdom 2 Blavatnik School of Computer

More information

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm: The clustering problem: partition genes into distinct sets with high homogeneity and high separation Hierarchical clustering algorithm: 1. Assign each object to a separate cluster. 2. Regroup the pair

More information

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1 From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Contents Dedication... iii Acknowledgments... xi About This Book... xiii About the Author... xvii Chapter 1: Introduction...

More information

PBZ FT01_PBZ FT01_TZ FT01_NZ. interface zone (I) tumor zone (TZ) necrotic zone (NZ)

PBZ FT01_PBZ FT01_TZ FT01_NZ. interface zone (I) tumor zone (TZ) necrotic zone (NZ) Oncotarget, Supplementary Materials www.impactjournals.com/oncotarget/ SUPPLEMENTRY FLES ndividuals factor map (P) FT_ FT_ FT_ Dim (.%) Dim (.%) >% peripheral brain zone () around % interface zone () FT

More information

Cellecta Overview. Started Operations in 2007 Headquarters: Mountain View, CA

Cellecta Overview. Started Operations in 2007 Headquarters: Mountain View, CA Cellecta Overview Started Operations in 2007 Headquarters: Mountain View, CA Focus: Development of flexible, scalable, and broadly parallel genetic screening assays to expedite the discovery and characterization

More information

Reliability of Ordination Analyses

Reliability of Ordination Analyses Reliability of Ordination Analyses Objectives: Discuss Reliability Define Consistency and Accuracy Discuss Validation Methods Opening Thoughts Inference Space: What is it? Inference space can be defined

More information

In silico estimates of tissue components in surgical samples based on

In silico estimates of tissue components in surgical samples based on In silico estimates of tissue components in surgical samples based on expression profiling data Yipeng Wang 1,2 *, Xiao-Qin Xia 1, Zhenyu Jia 2, Anne Sawyers 2, Huazhen Yao 1,2, Jessica Wang-Rodriquez

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality. Supplementary Figure 1 Assessment of sample purity and quality. (a) Hematoxylin and eosin staining of formaldehyde-fixed, paraffin-embedded sections from a human testis biopsy collected concurrently with

More information

TCGA. The Cancer Genome Atlas

TCGA. The Cancer Genome Atlas TCGA The Cancer Genome Atlas TCGA: History and Goal History: Started in 2005 by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) with $110 Million to catalogue

More information

SUPPLEMENTAL INFORMATION

SUPPLEMENTAL INFORMATION SUPPLEMENTAL INFORMATION GO term analysis of differentially methylated SUMIs. GO term analysis of the 458 SUMIs with the largest differential methylation between human and chimp shows that they are more

More information

A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data

A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data Method A Network Partition Algorithm for Mining Gene Functional Modules of Colon Cancer from DNA Microarray Data Xiao-Gang Ruan, Jin-Lian Wang*, and Jian-Geng Li Institute of Artificial Intelligence and

More information

Yingying Wei George Wu Hongkai Ji

Yingying Wei George Wu Hongkai Ji Stat Biosci (2013) 5:156 178 DOI 10.1007/s12561-012-9066-5 Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis,

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075 Aiello & Alter (216) PLoS One vol. 11 no. 1 e164546 S1 Appendix A-1 S1 Appendix: Figs A G and Table A a Tumor Generalized Fraction b Normal Generalized Fraction.25.5.75.25.5.75 1 53 4 59 2 58 8 57 3 48

More information

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis Jian Xu, Ph.D. Children s Research Institute, UTSW Introduction Outline Overview of genomic and next-gen sequencing technologies

More information

PathAct: a novel method for pathway analysis using gene expression profiles

PathAct: a novel method for pathway analysis using gene expression profiles www.bioinformation.net Hypothesis Volume 9(8) PathAct: a novel method for pathway analysis using gene expression profiles Kaoru Mogushi & Hiroshi Tanaka* Department of Bioinformatics, Division of Medical

More information