Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes

Similar documents
Copy Number Variation Methods and Data

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

Study and Comparison of Various Techniques of Image Edge Detection

Physical Model for the Evolution of the Genetic Code

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Introduction ORIGINAL RESEARCH

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

Reconstruction of gene regulatory network of colon cancer using information theoretic approach

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Lymphoma Cancer Classification Using Genetic Programming with SNR Features

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach

(From the Gastroenterology Division, Cornell University Medical College, New York 10021)

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

Using Past Queries for Resource Selection in Distributed Information Retrieval

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract.

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

BINNING SOMATIC MUTATIONS BASED ON BIOLOGICAL KNOWLEDGE FOR PREDICTING SURVIVAL: AN APPLICATION IN RENAL CELL CARCINOMA

Statistical models for predicting number of involved nodes in breast cancer patients

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract.

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Insights in Genetics and Genomics

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Boosting for tumor classification with gene expression data. Seminar für Statistik, ETH Zürich, CH-8092, Switzerland

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

Balanced Query Methods for Improving OCR-Based Retrieval

What Determines Attitude Improvements? Does Religiosity Help?

Impact of Imputation of Missing Data on Estimation of Survival Rates: An Example in Breast Cancer

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

Saeed Ghanbari, Seyyed Mohammad Taghi Ayatollahi*, Najaf Zare

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

A Geometric Approach To Fully Automatic Chromosome Segmentation

Using a Wavelet Representation for Classification of Movement in Bed

Evaluation of the generalized gamma as a tool for treatment planning optimization

Project title: Mathematical Models of Fish Populations in Marine Reserves

Prediction of Human Disease-Related Gene Clusters by Clustering Analysis

An Introduction to Modern Measurement Theory

Investigation of zinc oxide thin film by spectroscopic ellipsometry

A Computer-aided System for Discriminating Normal from Cancerous Regions in IHC Liver Cancer Tissue Images Using K-means Clustering*

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

Nonstandard Machine Learning Algorithms for Microarray Data Mining. Byoung-Tak Zhang

Statistical Analysis on Infectious Diseases in Dubai, UAE

Optimal Planning of Charging Station for Phased Electric Vehicle *

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia

Integrative Computational Identifications of the Signaling Pathway Network Related to TNF-alpha Stimulus in Vascular Endothelial Cells

Encoding processes, in memory scanning tasks

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems

Non-linear Multiple-Cue Judgment Tasks

Towards Prediction of Radiation Pneumonitis Arising from Lung Cancer Patients Using Machine Learning Approaches

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

HERMAN AGUINIS University of Colorado at Denver. SCOTT A. PETERSEN U.S. Military Academy at West Point. CHARLES A. PIERCE Montana State University

A-UNIFAC Modeling of Binary and Multicomponent Phase Equilibria of Fatty Esters+Water+Methanol+Glycerol

Resampling Methods for the Area Under the ROC Curve

NHS Outcomes Framework

Appendix F: The Grant Impact for SBIR Mills

IDENTIFICATION AND DELINEATION OF QRS COMPLEXES IN ELECTROCARDIOGRAM USING FUZZY C-MEANS ALGORITHM

Does reporting heterogeneity bias the measurement of health disparities?

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT

ARTICLE IN PRESS. computer methods and programs in biomedicine xxx (2007) xxx xxx. journal homepage:

ALMALAUREA WORKING PAPERS no. 9

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

Estimation of Relative Survival Based on Cancer Registry Data

Comparison of support vector machine based on genetic algorithm with logistic regression to diagnose obstructive sleep apnea

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

4.2 Scheduling to Minimize Maximum Lateness

Hierarchical kernel mixture models for the prediction of AIDS disease progression using HIV structural gp120 profiles

CONSTRUCTION OF STOCHASTIC MODEL FOR TIME TO DENGUE VIRUS TRANSMISSION WITH EXPONENTIAL DISTRIBUTION

Research Article Computational Analysis of Specific MicroRNA Biomarkers for Noninvasive Early Cancer Detection

Comparison among Feature Encoding Techniques for HIV-1 Protease Cleavage Specificity

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel,

The Effect of Fish Farmers Association on Technical Efficiency: An Application of Propensity Score Matching Analysis

Analysis of Correlated Recurrent and Terminal Events Data in SAS Li Lu 1, Chenwei Liu 2

Effects of Estrogen Contamination on Human Cells: Modeling and Prediction Based on Michaelis-Menten Kinetics 1

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

AUTOMATED CHARACTERIZATION OF ESOPHAGEAL AND SEVERELY INJURED VOICES BY MEANS OF ACOUSTIC PARAMETERS

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process

I I I I I I I I I I I I 60

Transcription:

IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 73 Statstcally Weghted Votng Analyss of Mcroarrays for Molecular Pattern Selecton and Dscovery Cancer Genotypes Vladmr A. Kuznetsov, Oleg V. Senko 2,, Lance D. Mller and Anna V. Ivshna Genome Insttute of Sngapore, 6 Bopols Str, Sngapore ; Computer Center of Russan Acad. of Scences, Moscow, Russa. Summary We developed a methodologcal approach to genetc class dscovery usng gene expresson mcroarray data, whch s based a on statstcally-orented class-predcton method called Statstcally Weghted Votng (SWV) analyss ntegratng wth clncal rsk factor and survval analyses, and statstcs of Gene Ontology annotaton terms whch we use to valdate canddate bomarker selecton. Our approach provdes a "votng" class predcton functon constructed usng the most nformatve and robust dscrete segments (sub-regons) of all covarate ranges and ther gradated pars, whch thus allows to model the nteractons of varables (genes). We show here that the SWV-based methodology can be adapted for mcroarray data and proftably used to bomarker selecton and dscovered two genetc classes assocated wth essentally mprovement of classcal hstologcal grade II of human breast cancer. Our fndngs show that small and relable genetc grade sgnatures could mprove an ndvdual prognoss for patents wth hstologc grated II and, thus after further bomedcal valdaton, be used n therapeutc plannng for breast cancer patents. Key words: Votng Algorthms, Bomarker Selecton, Predcton, Mcroarray, Hstologc Grades, Cancer Classfcaton. Introducton Gene expresson mcroarrays are assays for quanttatve studyng of transcrpt abundance profle of large proporton of genes n a mult-cell sample. To date, global gene expresson patterns have been used to classfy human cancers nto genetc classes related to dfferent clncal outcomes [, 2, 3, 4]. In these studes, dfferent unsupervsed methods such as herarchcal cluster analyss have been used [2, 3]. However, such methods, based on heurstc models, are qute senstve to the number of samples, populaton bas n a sample set, mssng values, model of dstance measure, and dfferent sources of techncal nose. It s therefore no surprse that the mcroarray predctons of bologcally and clncally sgnfcant tumor classes, as dscovered by the dfferent research groups usng unsupervsed methods, often exhbt poor reproducblty. Therefore, there s a serous concern regardng the ablty of unsupervsed methods to predct meanngful bologcally and clncally sgnfcant tumor classes; these classfcatons, generated by cluster analyss, stll reman extremely unstable [,4,5]. There exst many class predcton approaches that, when appled to a gven expresson dataset, could result n a range of classfcaton accuraces and gene numbers that comprse the classfer (a subset of hgh-nformatve and robust predctors selected by a supervsed method). Supervsed learnng algorthms could provde more accurate statstcally-orented results than unsupervsed methods, but, usually, they are used to dentfy novel markers n class predcton, not n class dscovery tasks. A number of authors have underlned crtcal ssues n gene selecton bas, error estmaton, fraglty of gene sgnatures, and overoptmstc performance estmaton due to model overft [6,7]. Thus, there does not exst one correct method. Ths has motvated us to develop a more sutable and better valdated methodologcal approach for nference of unknown classes from mcroarray data. Our approach to genetc class dscovery, whch s based on supervsed learnng, uses a statstcallyorented class-predcton method called Statstcally Weghted Syndromes (SWS) [8, 9]. Brefly, SWS provdes a "votng" class predcton functon constructed usng the most nformatve and robust dscrete segments (sub-regons) of all covarate ranges, whch are thus dscretzed. The varants of the SWS have been successful n accurate predctng therapeutc outcome n bladder cancer patents usng lmted clncal data [8,]. In clncal trals, t s mportant to mnmze the cost of the tral and the total number of patents. In the case of small numbers of patents, SWS methodology has demonstrated hgher robustness and predctve power than logstc Manuscrpt receved December 5, 26. Manuscrpt revsed December 25, 26.

74 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 regresson-based analyses and classfcaton and regresson tree (CART) methods [,]. Breast cancer s most common malgnancy among women. Hstologcal gradng of breast cancer provdes clncally mportant prognostc nformaton and defnes morphologcal subtypes nformatve of patent rsk. Approxmately 5% of all breast cancers are classfed as grade II [,3], whch s less nformatve for clncal decsons due to bologcal heterogenety and ntermedate rsk of cancer recurrence. To dscover the molecular bass of hstologc grade II, we analyzed genome-wde expresson profles of 35 prmary nvasve breast tumors. In ths work, usng advsed and computatonally ntensve verson of SWS methodology, though not prevously appled to largedmenson (mcroarray) data, combnng wth survval analyss, multvarate correlaton analyss and gene ontology analyss we dentfed several small subsets of hghly sgnfcant grade-assocated markers, whch could accurately classfy tumors of grade I (G) and grade III (G3) hstology, and dchotomze G2 tumors nto two hghly dscrmnant classes (termed G2a and G2b genetc grades) wth patent survval outcomes hghly smlar to those wth G and G3 hstology, respectvely. 2. Methods, Algorthms and Equatons 2.. Breast cancer data and mcroarrays To study the relatonshp between gene expresson and hstologc grade, we analyzed the expresson patterns of approxmately 23, gene transcrpts (representng by 44,928 probesets (p.s.) on Affymetrx U33A and U33B arrays) n 35 prmary breast tumors (NCBI Gene Expresson Omnbus (GEO) data sets GSE4922 and GSE456). The tumor samples were derved from three ndependent populaton-based cohorts: Uppsala (249 samples), Sngapore (4 samples) and Stockholm (58 samples) (Fgure ) enablng the robust dentfcaton and cross-cohort valdaton of hghly sgnfcant and predctve grade-assocated genes. Detals on patents, clncal nformaton, tumor samples, mcroarrays see n [ 2 ]. 2.2. A bass of SWV algorthm In smplfed terms, the statstcally weghted votng (SWV) analyss of mcroarrys class predcton process can be descrbed as follows. A tranng set consstng of samples of known classes (e.g., hstologc grade I (G) and hstologc grade III (G3) tumors) s used to select the varables (.e., gene expresson measurements; probesets or predctors), that allow the most accurate dscrmnaton (or predcton) of the samples n the tranng set. Once the SWV s traned on the optmal set of varables, SWS Schema of dscovery and valdaton of the genetc G2a and G2b breast cancer groups 44792 p.s. 23 G&G3 tumors (Uppsala) Selecton, tranng and leave-one-out U-test 264 p. s. classfer G vs G3 Selecton Valdaton PAM CER SWS 6 p.s. classfer Exam 7 p.s. classfer 3 7 p.s. classfers 4 8 p.s. classfer 2 Selecton, tranng and leave-one-out PAM Dscovery G G2a G2 G3 G2b Stockholm G2 Uppsala G G2 G3 Sngapore G2a G2b G2a G2b Bologcal valdaton (GO analyss, known rsk factors) Clncal valdaton (survval analyss, multvarate analyss) SWS: Statstcally Weghted Syndromes method; PAM: Predcton Analyss for mcroarray method; CER: Class Error Rate plot; p.s. probe set:g: grade ; G3: grade 2; G3: grade 3; G2a: grade 2a; G2b: grade 2b; GO: gene ontology. Fgure. Schema of dscovery and valdaton of genetcally dfferent subgroups wthn breast cancer patents wth hstologc grade II. t s then appled to an ndependent exam set (.e., a new set of samples not used n tranng) to valdate ts predcton accuracy. More detals are gven below. Brefly, for constructng the class predcton functon, the SWV uses the tranng set S % (comprsed of G and G3 tumor samples) to evaluate statstcally the weght of the graduated nformatve varables (predctors), and all possble pars of these predctors. The predctors are automatcally selected by SWV from n (n=44,5) probe sets (.e., gene expresson measurements) on U33A and U33B Affymetrx Genechps. The descrpton of each patent ncludes n (potental) prognostc varables X,, K X (sgnals from probe sets of the U33A n and U33B mcroarrays) and nformaton about class to whch a patent belongs. In partcular, the predctors mght be able to dscrmnate G and G3 tumors wth mnmum a posteror probablty. Relablty of the SWV class predcton functon s based on the standard leave-one-out procedure and on an addtonal exam of the class predcton ablty on one or more ndependent sample populatons (.e., patent cohorts). In ths applcaton the G2 tumor samples from the Uppsala, Sngapore and Stockholm cohorts have been used as exam datasets to test the SWV class predcton functon. Let us consder the avalable n-dmenson doman of the varables X, K, X as prognostc varable n space. The SWV algorthm s based on calculatng

IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 75 the posteror probabltes of the tumors belongng to one of two classes usng a weghted votng scheme nvolvng the sets of so called syndromes. A syndrome s the sub-regon of prognostc varable space. Wthn the syndrome, one class of samples (for nstance, G3 tumors) must be sgnfcantly hghly represented than another class (for nstance, G), and n other sub-regon(s) the nverse relatonshp should be observed. In the present verson of the SWV method, one-dmensonal and two-dmensonal sub-regons (syndromes) are used. Let b' and b" denote the boundares of the subregon for the varable X (the -th probe set); b X > b. One-dmensonal syndrome for the varable X s defned as the set of ponts n varable space for whch nequaltes b X > b are satsfed. Two-dmensonal syndrome for varables X and X s defned as a set of ponts n varable space for whch nequaltes b X > b and b X b > are satsfed. The syndromes are constructed at the ntal stage of tranng usng the optmal parttonng (OP) algorthm descrbed below. SWV tranng algorthm s based on several steps: ) optmal recodng (parttonng) of the gven varables (sgnal ntensty values) to obtan dscretevalued varables wth two or more gradatons; 2) selecton of the most nformatve and robust dscrete-valued varables and ther pared combnatons (termed syndromes) that together best characterze the classes of nterest; 3) tallyng the statstcally weghted votes of these syndromes to allow us to compute the value of the outcome predcton functon. In ths study we present an advanced procedure of SWS method based on permutaton statstcs and hgh-ntensve computatonal estmates of sgnfcant cut-off values provdng an effectve procedure of predctor selecton. 2.2.. Optmal parttonng (OP) OP method s used for constructng the optmal syndromes for each class (G and G3) usng the tranng set S %. The OP s based on the optmal parttonng of some potental prognostc varable X the range that allows the best separaton of the samples belongng to dfferent classes. To evaluate the separatng ablty of partton R (see below) n the tranng set S the ch-2 functonal s used [9]. The optmal parttons are searched nsde observed varable doman whch contans parttons wth crtcal values not greater than a fxed threshold (defned below). The nformatve partton wth the maxmal value of the ch-2 functonal s consdered optmal for the gven varable. 2.2.2. Stablty of parttonng Another mportant characterstc that allows evaluaton the prognostc ablty of parttonng model for specfc varables s the ndex of boundary nstablty. Let R, o R, K, R be optmal parttons of m varable X ranges that s calculated by tranng set S, S, K, S m, where S k s the tranng set wthout descrpton of the k th sample. Let K denote the k k dfferent classes (=,2). Let b, K, br be boundary ponts of optmal partton R k found by tranng set S k ; D s the varance of varable The boundary nstablty ndex ( S, K, r) X. κ for parttonng wth r elements s calculated as the rato: m r κ ( S, K, r) = [ k 2 ( bl bl ) ]. D ( r ) k = l= 2.2.3. Selectng of optmal varables set The OP can be used at the ntal stage of tranng for reducng the dmenson of the prognostc varables set. Selecton of the optmal set of prognostc varables depends on a suffcently hgh partton value determned by the Ch-2 functon. The threshold for selecton of nformatve varables s estmated based on p-value of Ch-2 functon estmated based on a permutaton procedure. The addtonal crteron of selecton of prognostc varables s the nstablty ndex κ ( S, K, r). The varable s used f value κ ( S, K, r) s less than thresholdκ, defned a pror by the user. When the partton of the gven varable s nstable ( κ ( S, K, r) < κ ), the varable s removed from the fnal optmal set of prognostc varables. Fnally, the optmal set of prognostc varables s defned f both selecton crtera are fulflled. 2.2.4. The weghted votng procedure

76 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 Let Q denote the set of constructed syndromes for class K. Let x * denote the pont of parametrc space. The SWV estmates a posteror probablty * P sv ( x ) of the class K at the pont x * that belongs to the ntersecton of syndromes q, K,qr from Q as follows: P ( x ) = sv r * = r w ν = w, () where ν s the fracton of class K among obects wth prognostc varables vectors belongng to syndrome q, w s the so-called weght of syndrome q. The weght formula, m w = m + d ), where d = ν ) ν + m w s calculated by the ) ( ( ν ) ν. The estmate of fracton ν varance has the second term m ( ν ) ν, whch s used to avod a value d ) equal to zero n cases when the gven syndrome s assocated only wth obects of one class from the tranng set. The results of testng appled and smulated tasks have demonstrated that formula () gves too low of estmates of condtonal probabltes for classes that are of smaller fracton n the tranng set. So, n ths study, the addtonal correcton of estmates n () has been mplemented. The fnal estmates of condtonal * probablty at pont x are calculated as SWV sv P ( x*)= P ( x*) χ ( S, ), where K sv χ ( S, K ) = /( P (x k ) ) and where x k s the vector of prognostc varables for the k-th samples from the tranng set. 2.3. Statstcal analyss of Gene Ontology (GO) terms GO analyss was facltated by PANTHER software (https://panther.appledbosystems.com/). Selected gene lsts were statstcally compared (Mann Whtney) wth a reference lst (e, NCBI Buld 35) comprsed of all genes represented on the mcroarray to dentfy sgnfcantly over- and under-represented GO terms. 2.4. Survval analyss The Kaplan Meer estmate was used to compute survval curves, and the p-value of the lkelhoodrato test was used to assess the statstcal sgnfcance of the resultant hazard ratos. Dseasefree survval (DFS) n the Uppsala, Stockholm cohorts was defned as the tme nterval from surgery untl the frst recurrence (local, regonal, or dstant) or last date of follow-up. Survval statstcs were performed n the R survval package. 2.5. Descrptve statstcs For nter-group comparsons usng the clncopathologcal measurements, Mann Whtney U-test statstcs were used for contnuous varables and onesded Fsher s exact test used for categorcal varables (Statstca-6 and StatXact-6 software). 3. Results 3.. SWV as the dscovery method of novel classes of tumors Our methodology s based on the schema presented n Fg. Begnnng wth the Uppsala dataset comprsed of 68 G and 55 G3 tumors, we used SWV optmal parttonng (OP) at the ntal stage of tranng to reduce the dmenson of the prognostc set of varables. SWV rank orders the set of probes accordng to specfc algorthmc crtera for assessng dfferental expresson between classes. Based on ths two-crtera selecton algorthm, we used SWV ch-2 values more than 24.38 (at p-value less then.); n combnaton wth low boundary nstablty ndex crtera ( κ <. for 9% of the selected nformatve varables and κ <.4 for % of the other nformatve varables). Ths procedure provdes optmal (robust) parttonng of the nformatve varables and leads to selecton of relatvely small sets of the potental gene predctors. We also used the U-test wth crtcal value p=.5 (wth Bonferron correcton). Based on these crtera, we selected 264 probesets (see Supplementary Materal n [2]). Table shows 25 (of 264) top-level selected probesets exhbtng the hghest SWV ch-2 values and the sgnfcant cut-off p-values of survval

IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 77 Fgure 2. (A-D) Probablty (Pr) scores from the SWV genetc grade classfer. Pr scores (-) generated by the class predcton algorthm are shown on the y-axes. Number of tumors per classfcaton exercse s shown on the x-axs. For tranng set on Panel A: Green dot denotes G tumor; red dot denotes G3 tumors. Panels B,C and D show the results of predctons for three ndependent cohorts of patents wth grade 2 tumors. In all these cohorts only few patents (.25<Pr<.75) mght be consdered as true Grade 2 tumor patents. statstcs (see below). Usng 264 probesets, SWV provded small class error rate (CER) (4.5% for G, and 5.5% for G3, respectvely) when the leave-one-out cross-valdaton procedure s used. A posteror probablty for G and G3 was also estmated by PAM [3] for each tumor sample by the leave-one-out cross-valdaton procedure wth resultng CER of 5% for G, and 6% for G3, respectvely. To extract the smallest possble genetc grade classfer from the 264 p.s., we vared the ntal parameters of the SWV algorthm to mnmze the number of predctors n tranng set provdng the maxmum correlaton coeffcent between posteror probabltes and true class ndcators (specfcally, was the ndcator of G tumors, and was the ndcator of G3 tumors n the G-G3 comparson) (Fgure 2A). The smallest robust genetc grade sgnature contans only 6 gene probesets (A.22949_at; B.228273_at; B226936_at; A.2879_s_at; A.24825_at; A.2492_s_at) representng 5 genes: BRRN, PRR, C6orf73, STK6, MELK. CER was 4.4% for class G and 5.5% for class G3 (Fgure 2.A). By PAM, for the G- G3 comparsons, maxmal predcton accuraces were obtaned wth 8 probesets (A.22949_at; A.2252_s_at; A.27_at; B.228273_at;A.22768_at; B.226936_at; A.2879_s_at;B.22268_s_at;A.2546_at;A.24822_at ;A.2997_s_a;A.2989_at;A.252_s_at; B.235572_at; A.2258_x_at;A.24825_at;B.224753_at;A.22436_s_at ). Both SWV and PAM correctly classfed 96% (65/68) of the Gs and 95% (52/55) of the G3s (by leave oneout method). The smaller number of probes sets requred by SWV (6 probesets) compared to PAM (8 probesets) reflects an ablty of SWV to use synergetc effect (coexpresson patterns) durng varable selecton (see Methods). Based on consstency between SWV and U-tests and PAM, we further consdered the classfcaton results usng the 264 varables. In two-group comparsons hgh CERs were observed n the G-G2 and G2-G3 predctons (data not shown), whle n the G-G3 CER was low (<5% errors). It suggests that G2 tumors could be not molecularly dstnct from ether low or hgh aggressve tumors. 3.2. Dchotomy of G2 tumors and dsease prognoss We next appled our grade G-G3 predctors drectly to the 26 G2 tumors of the Uppsala cohort to ask f these genetc determnants of low and hgh grade mght resolve moderately dfferentated G2 tumors nto separable classes. To do that we estmated the posteror probablty (Pr) as the lkelhood that a sample from the exam group of tumors belongs to one class (termed G-lke ) or the other (.e., G3-lke ).

78 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 Usng the 264 p.s. classfer, we found that the G2 tumors could be separated nto G-lke (n=83) and G3-lke (n=43) classes. We found 96% of the G2 tumors were assgned to ether the G-lke or G3-lke classes, ndcatng that almost all G2 tumors can be well separated nto dstnct low- and hgh- grade-lke classes (henceforth referred to as G2a and G2b genetc grades). Only few G2 tumors exhbt ntermedate Pr scores (<.75). Usng U-test and t-test, we confrmed a hgh separaton ablty of each of the SWV 264 predctors (p<.) We also evaluated a prognostc ablty of our 264 p.s. classfer whch we estmated usng dsease free survval (DFS) tme data sets. Kaplan-Meer Grade Grade 2a Grade 2 Fgure 3. Survval dfferences between G2a and G2b genetc grade subtypes. (A) Expresson profles of the Uppsala and Stockholm tumors segregated by the SWV (5-gene) genetc grade sgnature are shown. Green and red vertcal bars (top panel) denote hstologc G and G3 tumors, respectvely. (B-F) Kaplan-Meer survval curves for G2a (green) and G2b (red) subtypes are shown alone, or supermposed on survval curves of hstologc grades, 2, and 3. Uppsala cohort survval curves are shown for (B) all patents, (C) patents who receved no systemc therapy, and (D) patents postve for ER who receved endocrne therapy only. Stockholm cohort survval curves are shown for (E) patents treated wth systemc therapy and (F) those wth ER postve cancer treated wth endocrne therapy only. The lkelhood rato test p-value reflects the sgnfcance of the hazard ratos. Grade 2b Grade 3 Fgure 4. Correlaton portrats of hstologcal and genetc grades Fgure 2B shows that 96% of the G2 tumors (Uppsala cohort) were assgned by the classfer to ether the G-lke or G3-lke classes. The result was successfully verfed usng Stockholm and Sngapore cohorts (Fgure 2C,D, respectvely).

IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 79 survval analyss demonstrated a hghly sgnfcant dfference between survval curves of the G2a and G2b patents. (Cox proportonal model lkelhood rato test=7.2, p=.7). The same classfcaton results we obtaned usng the SWV 5-genetc grade classfer (Fgure 3). Survval analyss of G2a and G2b tumor subtypes based on the 5-genetc grade classfer, showed sgnfcant dfference between survval curves of the G2a and G2b patents (Fgure 3A). Notably, nether the G2a and G curves, nor the G2b and G3 curves were sgnfcantly dfferent from each other, respectvely. The G2a-G2b survval dfference was further observed n specfc therapeutc contexts ncludng patents who receved no systemc therapy (p=.4; Fgure 3B), wth systemc therapy (Fgure 3C), and those wth ER postve tumors who receved endocrne therapy only (p=.22; Fg 3D). In a smlar fashon, the genetc grade classfer was also predctve of recurrence n the Stockholm (G2) patents who receved systemc therapy (.e., chemotherapy, endocrne therapy or both) (p=.27; Fgure 3E) and those Ch-2 Affymetrx ID (Gvs G3) Survval p value Gene A.22949_at 92.6.9 BRRN B.226936_at 92.6.4 A.24822_at 82.2.2 TTK B.228559_at 82..23 A.289_s_at 79.2.49 PRC A.2433_at 79..23 TRIP3 A.28726_at 75.5.39 DKFZ p762e32 A.2524_s_at 73.4.4 RAD5 A.2287_s_at 73..5 CDC2 B.226473_at 72.6.4 A.24444_at 69.3.34 KIF A.29773_s_at 67.4.22 RRM2 B.235572_at 67.3.8 Spc24 A.22277_s_at 66.5.2 RACGAP A.2999_at 66.3. FLJ233 A.28755_at 66.3.6 KIF2A A.29_s_at 66.3.45 MGC5528 A.28662_s_at 66.2.34 HCAP-G A.2446_at 65.3.27 A.23438_at 64..32 STC2 A.2989_at 63.9.36 FOS A.2439_s_at 63.3.7 LAPTM4B A.25898_at 63.2.44 CX3CR A.22239_at 62.2.9 LOC4699 A.247_s_at 6.8.2 CCNB Table. 25 (of 264) top level nformatve p.s. whch were also sgnfcant n survval analyss (p<.2). wth ER postve dsease who receved only endocrne treatment (p=.32; Fgure 3F). 3.3. The 264 p.s. provde multple robust genetc grade sgnatures to dscrmnate G2a and G2b tumors Due to hgh nformatvely and stablty of varables of the 264-p.s. predctor, we hypotheszed that there are at least several small alternatve gene sub-sets (prognostc sgnatures) that could be used to classfy low and hgh aggressve breast tumors wth hgh accuracy (and therefore could provde ndvdual classfcaton of patents accordng to prognostc probablty of G and G3). To fnd such small-dmenson predctors, we excluded the 6 probesets, representng the 5-genetc grade classfer, from the 264 probsets, and randomly selected two non-overlappng subsets (each of 4 probesets) from the remanng 258 probesets and appled the SWV algorthm to each selected probe subsets. In ths way, we selected two addtonal small-dmenson sets of genetc grade classfers contanng (A.2252_s_at; A.2546_at; A.256_s_at;A.23929_s_at;B.222848_at;B.242_at; A.2287_at)and(A.252_s_at;A.289_s_at;A.2579 4_s_at;A.23438_at;B.2259_at;A.282_s_at;A.29 97_s_at). For Uppsala, Stockholm and Sngapore cohorts, each of these predctors provdes smlar hgh accuracy of classfcaton n G-G3 comparsons (>94% correct predctons), reproducble levels of separaton of G2a and G2b subtypes for dfferent cohorts (>94% patents assgnment wth G2a and G2b) and hghly sgnfcant dfferences n G2a-G2b comparson based on survval analyss. 3.4 Comparson of performance of SWV wth tradtonal pattern recognton algorthms We compare performance of SWV wth several tradtonal class predcton algorthms ncludng Fsher Dscrmnant Analyss (FDA), Q-nearest neghbors (QNN), Support vector Machne (SVM). Our analyss shows that SWV can provde smlar as other methods the hgh accuracy n leave-one-out analyss when 5-gene SWV sgnature or 232-gene SWV sgnature was used. Table 2 shows the results of such predctons base on the 5-gene SWV sgnature. To evaluate predctve power of the methods, after tranng step, we used the methods to predct G and G3 tumors of Stockholm and Sngapore cohorts. In these two tests SWV provdes better accuracy of the predcton G and G3 than the other methods. However, the most pronounce dfferences between methods we found when hstologc G2 tumors from ndependent cohorts were tested. Table 3 demonstrates that only SWV and PAM

8 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 provde strong separaton of the hstologc grade 2 tumors (Uppsala cohort) on the G-lke and G3-lke sets. Other methods provde dverse and poor dscrmnaton ablty of the G2 on these sets. Method Test : Uppsala DB Predcton base on the 6 best SWV selected p.s. G G3 SWV 65(95.6%) 52(94.5%) LFD 65(95.6%) 5(92.7%) QNN 66(97.%) 5(9.9%) SVM 66(97.%) 5(9.9%) Test 2: Stockholm DB G G3 SWV 27(96.4%) 46(75.4%) LFD 27(96.4%) 4(65.6%) QNN 27(96.4%) 45(73.8%) SVM 27(96.4%) 4(67.2%) Test 3 Sngapore DB G G3 SWV (9%) 34(72.3%) LFD (9%) 27(57.4%) QNN (9%) 29(6.7%) SVM (9%) 29(6.7%) Table 2. Evaluaton of performance of pattern recognton methods. True G2 G-lke (.25< Pr G3-lke Method (Pr.25) <.75) (Pr.75) SWS (6 p.s.) 38(3%) 4(4%) 83(66%) LFD (6 p.s.) 6(5%) 8(64%) 39(3%) QNN (6 p.s.) 25(2%) 22(7%) 79(63%) SVM (6 p.s.) 7(3%) 4(32%) 69(55%) PAM (8 p.s.) 37(29%) 6(5%) 83(66%) Table 3. Dscrmnaton of Uppsala hstologc G2 tumors base on 5-gene SWS sgnature usng SWS, LFD, QNN, SVM and base on 7-gene sgnature usng PAM. 3.5. Co-expresson analyss of 264 gene predctors support genetc grade 2 re-classfcaton We found that the 264 gene predctors can be grouped base on ther co-regulaton patterns, whch are represented on Fgure 4 usng Kendal tau correlaton coeffcent matrx for these predctors. Fgure shows the mages of the matrx of correlaton coeffcents clustered wth respect values of pared correlaton coeffcents between probe sets nto several separated groups of genes. The probes are ordered by usng Gene Cluster software and then vsualzed usng TreeVew program (http:/www.lbl.gov/esensoftware.htm ). To avod possble bas n the mages we selected randomly by 34 patents from G, G2a, G2, G2b and G3 tumor sets. Only statstcally sgnfcant correlatons (p<. after Bonferron correcton) were presented. Increasngly postve sgnfcant correlatons are represented wth reds of ncreasng ntensty, and ncreasngly negatve sgnfcant correlatons are represented wth greens of ncreasng ntensty. Non-sgnfcant correlatons are n black. The order of gene on all matrxes s the same. Fgure 4 demonstrates the pronounced dfferences n expresson co-regulaton patterns of the genes dfferentally expressed n the G, G2 and G3 groups. However, expresson gene correlaton matrx for G and G2a par are very smlar to the each other. The same phenomenon was found when we compared correlaton matrxes n the par G2a and G3 tumors. These fndngs support the vew that low and hgh G vs G2a G2a vs G2b G2b vs G3 p-value p-value p-value Bologcal process 6.2E-6 5.7E-28 2.5E-6 2.3E-2 2.5E-2 -- 3 2.7E-2 6.8E-5.E-3 4 -- 4.4E-3 4.9E-3 5.6E-2 5.5E-4 5.5E-3 6 -- 3.6E-2 4.4E-2 7 -- 5.E-3 Molecular Functon 8.E-3 7.2E-6 -- 9 3.5E-3 5.E-2 --.3E-2 -- -- -- 7.6E-7 4.2E-4 2 -- -- 7.5E-3 3 -- 7.8E-4 -- 4 --.9E-2 -- Pathway 5 4.9E-2 -- -- 6 -- -- 4.9E-2 7 3.E-2 Table 4. Gene ontology analyss of 264 p.s. grade classfer. Selected terms are shown wth correspondng p-values that reflect sgnfcance of term enrchment. (by Panther software http://www.pantherdb.org/panther/). :Cell cycle; 2: Chromatn packagng and remodelng; 3: Mtoss; 4: Inhbton of apoptoss; 5: Oncogeness; 6: Cell motlty; 7: Stress response; 8: Knase actvator; 9: Hstone; :Nuclec acd bndng; : Mcrotubule famly cytoskeletal proten; 2: Chemokne; 3: Non-receptor serne/threonne proten knase; 4: Extracellular matrx lnker

IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 8 proten; 5: Insuln/IGF pathway-mapkk/mapk cascade; 6: Apoptoss sgnalng pathway; 7: Ubqutn proteasome pathway. Genetc grade dseases (G+G2a and G2b+G3) could be represented by dfferent cancer cell precursors and Fgure 4 reflects specfc pathobologcal pathways assocated wth ntrnsc bologcal networks of these two tumor cell types. 3.6. Statstcal Analyss of GO terms A separaton nto the G2a and G2b s strongly supported by statstcal analyss of enrchment of specfc gene ontology (GO) categores of 237 RefSeq annotated gene names represented by 264 predctors n comparson to enrchment of the same GO categores n the human genome (NCBI Buld 35.). Table 4 dsplays a selected set of sgnfcantly enrched GO categores whch ncludes cell cycle, nhbton of apoptoss, cell motlty, stress response, knase actvators, mcrotubule famly cytoskeletal protens, ubqutn proteasome pathway, suggestng essental dfferences n genetc programs and pathways of the G2a- and G2b-type tumor cells. Interestngly, GO comparson G vs G2a and G2b vs G3 also demonstrate some sgnfcant bologcal dfferences, however, these dfferences less dfferent and multple than n G2a vs G2b. 3.7. Many genetc grade features are sgnfcantly assocated wth cell cycle, mtoss and patent survval tme Interestngly, among patents separated by a medan of DFS tme n survval analyss, a large proporton probesets (58 of the 264) can sgnfcantly dscrmnate (at p<.5) the patents on the poor and good responders (Ths result presents for the 25 top-level sgnfcant probesets n Table ). GO analyss of the lst of the gene assgned by these 58 probesets strongly ndcates that the assocated genes are essentally nvolved n cell cycle, mtoss ncludng mcrotubule-based process, mtotc chromosome condensaton, mtotc spndle organzaton and bogeness. These bologcal processes are wellknown as the essental n cancer outcome. 4. Dscusson We ntally nvestgated several dstnct class predcton/pattern recognton algorthms, ncludng the classcal Fsher dscrmnant analyss, k-nearest neghbors method, and Support Vector Machnes (SVM) method. Emprcally, these machne learnng algorthms provde approxmately smlar dscrmnaton ablty on the tranng sets. However we found that SWV and PAM [3] had the greatest dscrmnatve ablty and were most robust regardng ndvdual predctons when predcton rules on ndependent cohorts were used. As we have shown, SWV method allows the selecton of a smaller number of genes (only 5 genes representng by 6 probesets) compared to PAM (8 probesets) whle the classfcaton accuraces reman dentcal. SWV and PAM were used sde-by-sde n ths study to allow a performance comparson between two robust but mathematcally dstnct class predcton algorthms n terms of classfcaton accuraces and total number of genes requred for maxmum accuracy. PAM s a wdely used statstcal method for class predcton n large datasets. However, a lmtaton of PAM s that t s prone to over-parameterzaton (.e., the selecton of nonndependent varables (genes) wth redundant characterstcs) because t does not take nto account nteractons between genes. The SWV method reles on a dfferent statstcal approach whch nvolves a votng class predcton functon usng only the most nformatve and robust varables. SWV s strongly orented towards the selecton of a relatvely small number of genes (whch s more amenable to PCR-based dagnostc applcatons than large gene sets), even f the number of patents s lmted. Ths s because SWV takes nto account nteractons between varables, thus mnmzng the number of predctors needed and reducng the rsk of overparameterzaton. We have utlzed both approaches to allow a smple performance comparson n terms of classfcaton accuraces and total number of genes requred for hgh-accuracy classfcaton. Ma et. al. (23) were the frst to report a hstologc grade genetc sgnature capable of dstngushng low and hgh grade breast cancer. Usng 2K cdna mcroarrays to analyze from G, G2, and G3 mcrodssected tumor samples, they dentfed 2 genes dfferentally expressed between G and G3 tumors [3]. Usng these genes for tumor clusterng, they observed that the maorty of G2 tumors possessed a hybrd sgnature ntermedate to G and G3, wth few exceptons (Fgure 3 of ther orgnal report). Notably, ths fndng s n contrast wth our dscovery that the maorty of G2 tumors do not dsplay hybrd sgnatures, but rather possess clear G-lke or G3- lke genetc features. Accordng to our classfer, only a small percentage (6% or less) of the tumors n our study had ntermedate genetc grade measurements (e, Pr scores <.75 for G-lke and G3-lke). To address ths dscrepancy, we cross-compared the 2 grade-assocated genes n ther lst to our expanded set of 232 genes, and observed a statstcally sgnfcant overlap of 35 genes (p<.x -7 ; Monte Carlo smulaton). Ths overlap, however, represents only a small percentage of ether gene lst, ndcatng that the dscrepant observatons are most lkely explaned by fundamentally dfferent sgnature

82 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 compostons. It s also possble that dfferences n sample selecton and preparaton, sample sze (we have much larger samples and used 3 ndependent cohorts), RNA purfcaton, qualty of mcroarray analyss, and data normalzaton could have contrbuted to the varable results. Ma et al. results nconsstent to our data (see also [2]) and Sotrou et. al. data [4]. Sotrou et al. publshed ther fndngs of the 97-gene expresson grade ndex assocated wth hstologc grade and correlated wth relapse-free survval n ER-postve breast cancer [4]. Ther grade ndex, lke our grade sgnatures, could dchotomze the vast maorty of G2 tumors nto two groups wth expresson profles and survval characterstcs resemblng those of G and G3 tumors. Comparson of our gene classfers and the 97-gene classfer we revealed that 3 of our 5-gene grade sgnature genes, and 68 of our larger 232-gene set, overlapped wth ther 97-gene ndex. Ths hgh degree of overlap suggests that the 232-gene set and 97-gene set may utlze the fragments of the same fundamental transcrptonal programs/pathways for predctng patent outcomes. For nstant, n the both studes cell cycle genes were essentally enrched n the classfers. Whether the two predctors are collnear wth respect to patent survval wll be an mportant queston movng forward. Nevertheless, our studes and [4] converge on smlar fndngs renforces the vew that gene expresson-based measurements of hstologc grade can substantally contrbute to patent prognoss. We could consder the 264 probe sets and ts smaller relable subsets whch we dscussed n ths study as genetc predctors of the G+G2a (G-lke) and G2b (G3- lke)+g3 tumor types. Ths fndng shows that extensve molecular heterogenety exsts wthn the G2 tumor populaton, and ths heterogenety s robustly defned by the maor determnants of G and G3 cancer. It also demonstrates that a much larger and pervasve transcrptonal program underles the genetc grade predctons of the several SWV sgnatures despte ts composton of the mere 5 7 genes. Based on SWV, PAM and multvarate analyses, a mnor fracton (6%) of grade II breast cancers s stll unclassfed and mght be consdered as the mxture cancers [3] or as the techncal nose. Our fndngs show that genetc grade sgnatures could after addtonal bologcal valdaton mprove a prognoss for patents wth hstologc grated II and, thus, be used n therapeutc plannng for breast cancer patents. Our results support the vew that low and hgh grade dsease (G+G2a and G2b+G3), as re-defned genetcally, reflect genetcally stable ndependent pathobologcal enttes rather than a contnuum of progresson, whch could be assocated wth dstnct breast epthelum stem cell types [see [2] for references]. Ths study demonstrates that a system approach to mcroarray data analyss combnng wth data mnng, multvarate analyss, GO analyss, hstopatologcal nformaton from tssue samples and survval data of large patent cohorts can provde nsght n molecular classfcaton of cancers and other dseases. Such approach allows for the dentfcaton of coordnately expressed genes wth essental bologcal and clncal assocatons. References []. van 't Veer, L.J., Da, H., van de Vver et al. 22. Gene expresson proflng predcts clncal outcome of breast cancer. Nature 45:53-536. [2]. Sotrou, T., Perou, C.M., Tbshran, R. et al. 2. Gene expresson patterns of breast carcnomas dstngush tumor subclasses wth clncal mplcatons. Proc Natl Acad Sc USA 98:869-874. [3]. Ma, X.J., Salunga, R., Tuggle, J.T., Gaudet, J. et al. 23. Gene expresson profles of human breast cancer progresson. Proc Natl Acad Sc USA :5974-5979. [4]. Mller, L.D., Smeds, J., George, J. et al. 25. An expresson sgnature for p53 status n human breast cancer predcts mutaton status, transcrptonal effects, and patent survval. Proc Natl Acad Sc USA. 2:355-3555. [5] Lo, S., Sotrou, C., Buyse, M., Rutgers, E., Van't Veer, L.,, Pccart, M., Cardoso, F. 26. Molecular forecastng of breast cancer: Tme to move forward wth clncal testng. J Cln Oncol 24: 72-722. [6] Ransohoff D.F. 24.Rules of evdence for cancer molecular-marker dscovery and valdaton. Nat Rev Cancer 4:39-34. [7] Brenton, J.D., Carey, L.A., Ahmed, A.A., Caldas C. 25. Molecular classfcaton and molecular forecastng of breast cancer: ready for clncal applcaton? J Cln Oncol. 23:735-6. [8]. Kuznetsov, V.A., Ivshna, A.V., Sen'ko, O.V., Kuznetsova, A.V. 996.Syndrome approach for computer recognton of fuzzy systems and ts applcaton to mmunologcal dagnostcs and prognoss of human cancer. Math. Comput. Modelng 23:92-2. [9]. Kuznetsov, V.A., Knott, G.D., Ivshna, A.V. 998. Artfcal mmune system based on syndromesresponse approach: Theory and ther applcaton to recognton of the patterns of mmune response and prognoss of therapy outcome. In Proc. of IEEE Intern. Conf. on Systems, Man, and Cybernetcs. San Dego, CA, USA. 384-389. []. Jackson, A.M., Ivshna, A.V., Senko, O., Kuznetsova, A., Sundan, A., O'Donnell, M.A., Clnton, S., Alexandroff, A.B., Selby, P.J., James, K., Kuznetsov, V.A. 998. Prognoss of ntravescal bacllus Calmette- Guern therapy for superfcal bladder cancer by mmunologcal urnary measurements: statstcally weghted syndromes analyss. J Urol 59:54-63.

IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 83 []. Mueller, B.U., Zechner, S.L., Kuznetsov, V.A., Heath- Chozz, M., Pzzo P.A., and Dmtrov, D.S. 998. Indvdual prognoses of long-term responses to antretrovral treatment based on vrologcal, mmunologcal and pharmacologcal parameters measured durng the frst week under therapy. AIDS, 3: f9-f96. [2] Ivshna, A.V., George, J., Senko, O.V., Mow, B., Putt, T.C., Smeds, J., Nordgren, H., Bergh, J., Lu, E. T-B., Kuznetsov, V.A., Mller, L.D. 26. Genetc reclassfcaton of hstologc grade delneates new clncal subtypes of breast cancer. Cancer Res., 66: 292-3. [3] Tbshran, R., Haste, T., Narasmhan, B., and Chu, G. 22. Dagnoss of multple cancer types by shrunken centrods of gene expresson. Proc Natl Acad Sc USA, 99:6567-6572. [4] Sotrou, C., Wrapat, P., Lo, S. et al. 26. Gene expresson proflng n breast cancer: understandng the molecular bass of hstologc grade to mprove prognoss. J Natl Cancer Inst, 98:262-272. Acknowledgments Grant support: Sngapore Agency for Scence Technology and Research. Oleg Senko receved the PhD n mathematcs n 99 from Computer Center of USSR Academy of Scences. Hs nterest s n development of novel statstcal and combnatoral methods of pattern recognton, forecastng, data mnng and ts applcatons. He s a senor researcher at Dorodncyn Computng Center of the Russan Academy of Scence, Moscow, Russa. Lance D. Mller receved the PhD n 2 n Genetcs and Molecular Bology from Unversty North Carolna at Chapel Hll Chapel Hll, NC, USA. From 2 at present a senor group leader and head of the Laboratory of Mcroarrays and Expresson Genomcs at Genome Insttute of Sngapore. Hs prmary nterest s the applcaton of genomc technologes towards solvng problems n human dsease. Hs current research s focused prmarly on the molecular characterzaton of human breast cancer and the mcroarray-based detecton and characterzaton of human pathogens. Vladmr A. Kuznetsov receved the Ph.D n bophyscs n 984 from Moscow State Unversty and Dr. Sc. n math.- physcs n 992 from Scence & Techncal Unon of the Russan Academy of Scences, respectvely. Durng 982-998 he was researcher, senor researcher and head of the laboratory of Mathematcal Immunobophyscs at the Insttute of Chemcal Physcs of Russan Academy of Scences, Moscow, to study mathematcal models of mmune system and tumor-mmune system nteracton network. He also developed computatonal data mnng tools, pattern recognton methods and ts applcatons n clncal trals. Durng 996-24, he worked at FDA USA, and NCI/NIH, NICHD/NIH (Bethesda, MD, USA) as a senor researcher. He was nvolved n NIH Cancer Anatomy Genome Proect and other genomcs proects focusng on study of cancer and nfectous dseases. From 24 he s a senor group leader and head of the Laboratory Computatonal Genomcs and Systems Bology at Genome Insttute of Sngapore. Anna Ivshna receved the MD n 979 from the Fst Moscow Medcal Insttute, USSR and PhD n 986 from All Unon Cancer Center of Medcal Academy of USSR. Durng 979-996 she was a researcher at Clncal Insttute of Russan Cancer Center of Medcal Academy. At the Laboratory of Clncal Immunology of ths nsttute she studed mmune response aganst cancer and developed clncal mmunodagnostc methods. For several years she worked as a scentst at FDA USA and she developed methods of dagnostcs and control of nfectous dseases. She s now a scentst of Laboratory of Mcroarrays and Expresson Genomcs at Genome Insttute of Sngapore; research nterest focuses on mcroarray expresson data analyss and clncal rsk factors amng to reveal novel genetc markers for relable classfcaton of tumor sub-types and predcton of clncal outcome of cancer dseases.