Simple Discriminant Functions Identify Small Sets of Genes that Distinguish Cancer Phenotype from Normal

Size: px
Start display at page:

Download "Simple Discriminant Functions Identify Small Sets of Genes that Distinguish Cancer Phenotype from Normal"

Transcription

1 Genome Informatics 16(1): (2005) 245 Simple Discriminant Functions Identify Small Sets of Genes that Distinguish Cancer Phenotype from Normal Gul S. Dalgin 1 Charles DeLisi 2,3 sdalgin@bu.edu delisi@bu.edu 1 Molecular Biology, Cell Biology and Biochemistry Program, Boston University, Boston, MA 02215, USA 2 Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA 3 Bioinformatics Graduate Program, Boston University, Boston, MA, 02215, USA Abstract High-throughput gene expression profiling can identify sets of genes that are differentially expressed between different phenotypes. Discovering marker genes is particularly important in diagnosis of a cancer phenotype. However, gene sets produced to date are too large to be economically viable diagnostics. We use a hybrid decision tree-discriminant analysis to identify small sets of genes, i.e. single genes and gene pairs, which separate normal samples from different stages of tumor samples. Half the samples are selected for training to form the probability distribution of expression values of each gene. The distributions for the tumor and normal phenotypes are then used to classify the test samples. The algorithm also identifies gene pairs by combining the probability distributions to construct a decision tree which is used to determine the class of test samples. After a series of training and testing sessions, genes and gene pairs that classify all samples correctly are recorded. The method was applied to a breast cancer data; and classifier genes that distinguish normal breast from different stages of breast tumor were identified. The genes were ranked according to their minimum Euclidean distance between the expression values in tumor and normal samples. The algorithm was able to pick known cancer related genes but also find genes that were not identified as differentially expressed by t-test with a 2 fold cut-off. Overall, the method generates possible diagnostic genes and gene pairs for a specific disease phenotype to pursue further biological interpretations in cancer biology. Keywords: discriminant analysis, gene expression, cancer, diagnostic genes 1 Introduction High-throughput gene expression profiling using microarray technology has emerged as a promising technology for correlating gene expression with environmental conditions. Methods are available for allocating samples into pre-specified phenotypic groups based on differences in gene expression profiles, or for segregating samples into groups without prior specification [2, 9]. When groups are pre-specified, the aim is typically to identify differentially expressed diagnostic gene sets. Sets of over or underexpressed genes that stratify closely related diseases have been successfully identified in ALL-AML classification [4], ovarian cancer and normal tissue [3], BRCA1, BRCA2, and sporadic breast tumor classification [5] and poor prognosis and good prognosis breast cancer samples [10]. The main problem is that sets of differentially expressed genes produced so far are too large to be used as feasible diagnostics. In this paper, we present a hybrid decision tree-discriminant analysis to identify small sets of genes whose joint expression distribution separates two pre-defined classes. The method generates probability distributions from the fraction of samples in the two classes, and exploits it to select genes that classify all samples accurately after a series of training and test sessions.

2 246 Dalgin and DeLisi Herein, we applied the methodology to breast cancer data generated by Ma and colleagues [6]. Single genes and gene pairs, whose joint expression distribution separate tissue samples in different stages and grades of malignancy from normal tissue, were identified as candidate diagnostic genes. Overall, the results suggest that this new discriminant analysis efficiently identifies small gene sets that distinguish phenotypes. 2 Method and Results Data The method was applied to the breast cancer gene expression data produced by Ma and colleagues [6]. The data is described in more detail elsewhere (Dalgin et al., manuscript in preparation). The samples include normal breast tissues from breast cancer patients and three stages of breast tumor (premalignant stage (ADH), in situ cancer (DCIS) and invasive cancer (IDC) with different grades (Grade I - slow growing tumor, Grade III - fast growing tumor, Grade II - intermediate). Overall, 32 normal samples, 8 ADH, 9 DCIS Grade I, 11 DCIS Grade II, 10 DCIS Grade III, 5 IDC Grade I, 9 IDC Grade II and 9 IDC Grade III samples; and 1940 genes that were found to be differentially expressed between normal and three stages (ADH, DCIS and IDC) by linear discriminant analysis [6] were used as the publicly available data, in the current analysis. The gene expression level (E) of each gene was reported as the ratio of the expression level in the experimental sample to the expression level in the reference sample (E = log 2 (sample/reference sample)). As the reference sample, a human universal reference RNA from Stratagene was used [6]. Method The method consists of three steps, i.e. (1) dividing the samples into training and test sets (2) generating probability distributions for identifying single genes and for decision analysis when pairs are used (3) assigning test samples; and selecting genes and gene pairs that perform well. An overview of the method is given in Figure 1. Figure 1: Overview of the method.

3 Identify Small Sets of Genes that Distinguish Cancer Phenotype from Normal 247 In the first step, the samples are divided into training and test samples in each partition. The method employs a cross-validation technique by which the samples are randomly (as in the first partitionings) or semi-randomly (second and third partitionings) separated as training and test sets (See Supplementary Figure). This technique assures that all samples are used at least once in training, but it still has usage bias. The first partitioning will always be random irrespective of other partitionings whereas the second and third partitionings are semi-random to guarantee good coverage of the samples. Overall, 99 partitions are performed. In the second step, the probability distributions of expression values (E) of a gene in each of the two training classes are generated. These are used to classify genes in the test samples. The distributions of the endothelin 3 gene expression levels for tumor (T ) and normal (N) is shown in Figure 2 as an example tumor normal Series3 Series4 Series5 Series6 Series7 Series8 Series9 Series10 Series11 Series12 Series13 Series Figure 2: Distribution of expression values (E) of endothelin 3 gene in normal and tumor samples. Expression values are divided into intervals. The interval boundaries are shown near the y-axis. ** The order of the samples is arbitrary and the sample numbers have no special importance. The expression values are divided into intervals to generate the probability distributions. The number of intervals is chosen such that the values are discretisized into neither very small nor very big intervals. As an example, 32 normal and 8 ADH expression values were divided into 10 intervals. P (E N), the probability of an expression in normal samples; and P (E T ), the probability of an expression in tumor samples, are calculated from the fraction of normal and tumor samples, respectively, in the interval E + de. The probability distribution for the endothelin 3 gene is shown in Figure 3.

4 248 Dalgin and DeLisi Figure 3: Probability distribution for endothelin 3 gene. The lower (at the bottom) and upper boundary values (at the top) for each interval are shown in the x-axis. P (E N) and P (E T ) are calculated from the fractions of normal and tumor samples, respectively, in an interval. It is evident that for this particular case, expression levels of endothelin 3 above 1.28 occur only in the normal group, and expression levels below 0.06 occur only in the tumor group. However, since separation is incomplete, this gene by itself is not a good candidate to use as a signature. We therefore ask whether a second gene can be found which, in combination with endothelin 3, gives perfect separation of the training set. In order to limit the search we use, for the first gene in the pair (endothelin 3 in this example), only genes that misclassify less than 10% of the total training samples. The pairs (or singlets, when the first gene separates perfectly) thus obtained, are then evaluated on the test set. Samples in the test set are assigned to the normal category if P (N E) > P (T E), where E = (E 1, E 2 ) and to tumor otherwise, where the posteriors are given by Bayes rule (Figure 1, step 3). The pairs that correctly classify all test samples are recorded as perfect pairs after each partition. Table 1: Number of single genes and gene-pairs identified for each normal and tumor stage comparison. Number of single classifier genes Number of pairs (genes involved) Normal-ADH (336 genes) Normal-DCIS I (502 genes) Normal-DCIS II (455 genes) Normal-DCIS III (670 genes) Normal-IDC I (564 genes) Normal-IDC II (649 genes) Normal-IDC III (743 genes) DCIS Grade I is abbreviated as DCIS I. Number of pairs that appear in at least 10 partitions.

5 Identify Small Sets of Genes that Distinguish Cancer Phenotype from Normal 249 The classifier genes that distinguish separately between normal and 7 stages of breast tumor were identified after performing 99 partitions for each case. Single genes and gene pairs that correctly separate the samples in at least 1 partition were recorded for each comparison. The results are summarized in Table 1. In order to determine how well the genes distinguish the two groups, genes and gene pairs were ranked based on a distance measure, which uses the overall expression value distribution. For a single classifier gene, the Euclidean distance between the tumor and normal samples was calculated. The rank of gene i is determined by this distance (d i ): d i = (ET,i E N,i ) 2 where E T,i and E N,i is the expression value of gene i in the tumor and normal sample, respectively. The rank of the gene is inversely proportional to this distance; the larger the distance, the better the gene as a classifier. In order to assess if the Euclidean distance is a distinguished feature of the classifier genes with respect to other genes, the distribution of Euclidean distances for single classifier genes and other genes was compared. An example histogram is shown in Figure 4 for single classifier genes that distinguish normal samples from DCIS Grade III samples. In this case, it is clear that the distances of single classifier genes are higher than non-classifier genes; hence have a better separation between their expression values in normal and tumor samples. This observation is valid for other classifier genes as well (data not shown). This also suggests that Euclidean distance can be used to distinguish/rank classifier genes. That is to say, when the genes are to be tested on an independent data set, the genes that have been top ranked in terms of their distance are expected to perform better than the others. Figure 4: Histogram of the Euclidean distance calculated for normal-dcis Grade III single classifier genes and the rest of the genes. Euclidean distance is calculated between the expression values of genes in normal and tumor samples.

6 250 Dalgin and DeLisi Similarly, the gene pairs were ranked according to their minimum Euclidean distance between the expression values in tumor and normal samples. First, the Euclidean distance between the expression values of the pair (gene i and j) in a tumor sample and each normal sample was calculated: ( (ET,i ) d((e T,i, E T,j ), E N ) = min E N,i ) 2 + (E T,j E N,j ) 2 + (E T,i E N,j ) 2 + (E T,j E N,i ) 2 The minimum of this set was selected for that tumor sample. After carrying out the procedure for all tumor samples, the minimum of this set of minima was selected as the minimum distance for the gene pair: d i,j = min(d((e T,i, E T,j ), E N )) T = 1,..., N T where N T is the total number of tumor samples. The rank of the gene pair is inversely proportional to this minimum distance. In order to compare our results with a conventional method, we performed t-tests on the same sets of genes, i.e genes, and the same classes defined in breast cancer, i.e., normal and 7 stages of breast tumor. The average fold change and the significance values were calculated for each gene for each normal-breast cancer stage. The aim was (1) to see whether the method selects the same or different genes when t-test is used, and (2) evaluate the classifier genes in terms of quantitative measures like average fold change. The percentage of single classifier genes that show differential expression change with a p-value < 0.05 and average fold (E T umor /E Normal ) > 2 are shown in Table 2. Table 2: Average fold (tumor/normal) and p-values of single classifier genes obtained by t-test. Avg fold (T/N) > 2 p-value < 0.05 Normal-ADH 19.1 % (4/21) 71.4 % (15/21) Normal-DCIS I 61.5 % (16/26) 84.6 % (22/26) Normal-DCIS II 66.7 % (18/27) 100 % (27/27) Normal-DCIS III 55.3 % (21/38) 94.7 % (36/38) Normal-IDC I 75.0 % (42/56) 80.4 % (45/56) Normal-IDC II 73.7 % (28/34) 89.5 % (34/38) Normal-IDC III 55.6 % (25/45) 91.1 % (41/45) The results show that a significant portion of the genes (from 26.3% for IDC Grade II to 80.9% for ADH) have changed less than 2 fold in tumor; hence would not be identified as significant by t-test. However, these genes have been identified as possible classifier genes by the current algorithm. The majority of the classifier genes have statistically significant p-values which indicate that their expression change in the two classes is significant. 3 Discussion 3.1 Comparison of the Method with Related Methods Several statistical methods have been successfully applied to find discriminatory genes between groups of samples in analyzing gene expression data. The method introduced in this paper is methodologically compared with two of the most frequently applied methods, t-test and linear discriminant analysis. Linear discriminant analysis (LDA) finds a linear subspace that maximizes class separability among the feature vector projections, where each gene is represented by a vector of its expression values across

7 Identify Small Sets of Genes that Distinguish Cancer Phenotype from Normal 251 the samples, in the space. Popular separability criterion is the ratio between-class scatter and withinclass scatter. LDA seeks directions efficient for discrimination. LDA assumes that the class mean conveys most of the class information. Therefore, it cannot enhance nonlinearly separable data sets and classes with the same mean. Additionally, with a limited number of samples and fairly large number of genes, between-class and within-class separabilites can be quite unstable. The main difference between LDA and our hybrid discriminant analysis is that LDA finds the separation of the classes spatially, by representing the classes as vectors, whereas our algorithm separates the classes by a probabilistic approach. It takes into account the distribution of expression values in both classes and generates probability distributions from the fraction of two classes in defined intervals. The probability distributions are then used to determine the class of an unknown sample in the case of single genes. The distributions of two genes are combined to construct a decision tree to assign an unknown class by a gene pair. The algorithm selects the genes that correctly assign the class of all training and test samples after a good number of simulations; hence consistency across all samples is an emphasized criterion of the method. The other method that has been applied to identify differentially expressed genes is t-test to test the hypothesis that the means of two distributions of values are different. The main disadvantage of this approach in gene expression analysis is that it produces large gene sets which are not viable to be used as diagnostics. Moreover, in some cases, e.g. closely related diseases, changes in the expression of single genes are very modest or not significant at all [8]. Our method is advantageous in such cases since it takes into account the fraction of samples in two classes no matter how similar/dissimilar the two class means or variances are. It not only selects single genes but also gene pairs which together partitions the two classes even if individual genes are not perfect classifiers alone. The method is designed to select single genes and pairs to classify two groups; however, it was considered to extend it to identify triplets or more group of genes. The downsides of this are (1) the execution time increases substantially since the search space, i.e. number of triples, is much bigger than the case of pairs and, (2) a high number of triplets have been identified for each classification which makes it hard to evaluate, rank and select for further testing on another data set. The testing methodology used here differs from the standard jackknife technique, which constructs the training set by leaving out a normal and a tumor sample, and then tests the genes on that pair. The jackknife has the advantage of being unbiased, but it is computationally much more demanding than the procedure we have used. We are currently investigating the difference between the two methods. 3.2 Marker Genes for Breast Cancer In particular, we identified single genes and gene pairs that partition normal breast samples from different breast tumor stages (Table 1). The overlap between the single gene classifiers (0%-10.34%) and between the gene pairs (0.11%-3.28%) are low showing that majority of these classifiers are specific to a certain tumor stage. Some of these genes include previously characterized cancer related genes such as Angiopoiteinlike 4, which is known to be important in sustained angiogenesis; Matrix metalloproteinase 7, which was found to be up-regulated in colorectal carcinomas [7] and Glutamine synthase, which is also up-regulated in tumor and important in tumor progression [1]. Grade specific genes also agree with previous findings. As an example, BIRC5 (survivin) gene, which is known to be overexpressed in common human cancers and was found to be correlated with Grade III tumors [6], was also identified only in Grade III tumors in this study. In summary, we were able to distinguish normal from different stages of breast tumor using no more than two genes in each instance. Each of these single genes and gene pairs are possible candidates to be used as diagnostics for a specific type of breast tumor. The total sum of all such pairs includes a large number of genes (Table 1), and that provides an entrée into the search for correlated and/or co-

8 252 Dalgin and DeLisi regulated genes. Future work will focus on the identification of biological processes that are enriched with subsets of these genes and on further determining the regulatory mechanisms controlling these genes. References [1] Dang, C. V. and Semenza, G. L., Oncogenic alterations of metabolism, TIBS Reviews, 24(2):68 72, [2] Eisen, M. B, Spellman, P. T., Brown, P. O., and Botstein, D., Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA, 95(25): , [3] Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M., and Haussler, D., Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16(10): , [4] Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, 286(5439): , [5] Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O. P., Wilfond, B., Borg, A., and Trent, J., Gene-expression profiles in hereditary breast cancer, N. Engl. J. Med., 344(8): , [6] Ma, X. J., Salunga, R., Tuggle, J. T., Gaudet, J., Enright, E., McQuary, P., Payette, T., Pistone, M., Stecker, K., Zhang, B. M., Zhou, Y. X., Varnholt, H., Smith, B., Gadd, M., Chatfield, E., Kessler, J., Baer, T. M., Erlander, M. G., and Sgroi, D. C., Gene expression profiles of human breast cancer progression, Proc. Natl. Acad. Sci. USA, 100(10): , [7] Masaki, T., Matsuoka, H., Sugiyama, M., Abe, N., Goto, A., Sakamoto, A., and Atomi, T., Matrilysin (MMP-7) as a significant determinant of malignant potential of early invasive colorectal carcinomas, Br. J. Cancer, 84(10): , [8] Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altschuler, D., and Groop, L. C., PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes, Nat. Genet., 34(3): , [9] Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareevan, S., Dmitrovsky, E., Lander, E. S., and Golub, T. R., Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation, Proc. Natl. Acad. Sci. USA, 96(6): , [10] van t Veer L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A. M., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., and Friend, S. H., Gene expression profiling predicts clinical outcome of breast cancer, Nature, 415(6871): , 2002.

9 Identify Small Sets of Genes that Distinguish Cancer Phenotype from Normal 253 Supplementary Figure 32 Normal samples (N) 8 ADH samples (T) Training set Test set 16 N 1 16 N 1 4 T 1 4 T 1 1 st partitioning 8 N 2 8N 2 2 T 2 2T 2 8 N 2 2 T N 2 T 2 2 nd partitioning 8 N 3 2T 3 8N 3 2 T 3 8 N 3 2T 3 N 3 8 2T 3 3rd partitioning Figure 5: A schematic overview of dividing the samples into training and test sets. In the first partitioning, half of one class and half of the other class samples are selected randomly to train, and the others remain to test. In the second partitioning, half of the training samples are chosen randomly from the training set of the first partitioning and the other half from those that have not been used in the training set (the test set of the first partitioning). In the third partitioning, all samples not previously used in training are selected for training, and the remainder is chosen randomly from the training set of the second partitioning.

Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes

Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Comparison of Gene Set Analysis with Various Score Transformations to Test the Significance of Sets of Genes Ivan Arreola and Dr. David Han Department of Management of Science and Statistics, University

More information

MODEL-BASED CLUSTERING IN GENE EXPRESSION MICROARRAYS: AN APPLICATION TO BREAST CANCER DATA

MODEL-BASED CLUSTERING IN GENE EXPRESSION MICROARRAYS: AN APPLICATION TO BREAST CANCER DATA International Journal of Software Engineering and Knowledge Engineering Vol. 13, No. 6 (2003) 579 592 c World Scientific Publishing Company MODEL-BASED CLUSTERING IN GENE EXPRESSION MICROARRAYS: AN APPLICATION

More information

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer Hautaniemi, Sampsa; Ringnér, Markus; Kauraniemi, Päivikki; Kallioniemi, Anne; Edgren, Henrik; Yli-Harja, Olli; Astola,

More information

A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION

A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION 5-9 JATIT. All rights reserved. A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION 1 H. Mahmoodian, M. Hamiruce Marhaban, 3 R. A. Rahim, R. Rosli, 5 M. Iqbal Saripan 1 PhD student, Department

More information

GlobalAncova with Special Sum of Squares Decompositions

GlobalAncova with Special Sum of Squares Decompositions GlobalAncova with Special Sum of Squares Decompositions Ramona Scheufele Manuela Hummel Reinhard Meister Ulrich Mansmann October 30, 2018 Contents 1 Abstract 1 2 Sequential and Type III Decomposition 2

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

Classification of cancer profiles. ABDBM Ron Shamir

Classification of cancer profiles. ABDBM Ron Shamir Classification of cancer profiles 1 Background: Cancer Classification Cancer classification is central to cancer treatment; Traditional cancer classification methods: location; morphology, cytogenesis;

More information

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Data analysis in microarray experiment

Data analysis in microarray experiment 16 1 004 Chinese Bulletin of Life Sciences Vol. 16, No. 1 Feb., 004 1004-0374 (004) 01-0041-08 100005 Q33 A Data analysis in microarray experiment YANG Chang, FANG Fu-De * (National Laboratory of Medical

More information

Global Testing. Ulrich Mansmann, Reinhard Meister, Manuela Hummel

Global Testing. Ulrich Mansmann, Reinhard Meister, Manuela Hummel Global Testing Ulrich Mansmann, Reinhard Meister, Manuela Hummel Practical DNA Microarray Analysis, March 08, Heidelberg http://compdiag.molgen.mpg.de/ngfn/pma08mar.shtml Abstract. This is the tutorial

More information

Package propoverlap. R topics documented: February 20, Type Package

Package propoverlap. R topics documented: February 20, Type Package Type Package Package propoverlap February 20, 2015 Title Feature (gene) selection based on the Proportional Overlapping Scores Version 1.0 Date 2014-09-15 Author Osama Mahmoud, Andrew Harrison, Aris Perperoglou,

More information

Opinion Microarrays and molecular markers for tumor classification Brian Z Ring and Douglas T Ross

Opinion Microarrays and molecular markers for tumor classification Brian Z Ring and Douglas T Ross http://genomebiology.com/2002/3/5/comment/2005.1 Opinion Microarrays and molecular markers for tumor classification Brian Z Ring and Douglas T Ross Address: Applied Genomics Inc., 525 Del Rey Ave #B, Sunnyvale,

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Global Testing. Ulrich Mansmann, Reinhard Meister, Manuela Hummel

Global Testing. Ulrich Mansmann, Reinhard Meister, Manuela Hummel Global Testing Ulrich Mansmann, Reinhard Meister, Manuela Hummel Practical DNA Microarray Analysis, November 2006, Heidelberg http://compdiag.molgen.mpg.de/ngfn/pma2006nov.shtml Abstract. This is the tutorial

More information

Estimating the Number of Clusters in DNA Microarray Data

Estimating the Number of Clusters in DNA Microarray Data Estimating the Number of Clusters in DNA Microarray Data N. Bolshakova 1, F. Azuaje 2 1 Department of Computer Science, Trinity College Dublin, Ireland 2 School of Computing and Mathematics, University

More information

MASTER REGULATORS USED AS BREAST CANCER METASTASIS CLASSIFIER *

MASTER REGULATORS USED AS BREAST CANCER METASTASIS CLASSIFIER * MASTER REGULATORS USED AS BREAST CANCER METASTASIS CLASSIFIER * WEI KEAT LIM, EUGENIA LYASHENKO, ANDREA CALIFANO Center for Computational Biology and Bioinformatics, Department of Biomedical Informatics,

More information

CANCER CLASSIFICATION USING SINGLE GENES

CANCER CLASSIFICATION USING SINGLE GENES 179 CANCER CLASSIFICATION USING SINGLE GENES XIAOSHENG WANG 1 OSAMU GOTOH 1,2 david@genome.ist.i.kyoto-u.ac.jp o.gotoh@i.kyoto-u.ac.jp 1 Department of Intelligence Science and Technology, Graduate School

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review

More information

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology

More information

Introduction to Discrimination in Microarray Data Analysis

Introduction to Discrimination in Microarray Data Analysis Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t

More information

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures 1 2 3 4 5 Kathleen T Quach Department of Neuroscience University of California, San Diego

More information

NIH Public Access Author Manuscript Best Pract Res Clin Haematol. Author manuscript; available in PMC 2010 June 1.

NIH Public Access Author Manuscript Best Pract Res Clin Haematol. Author manuscript; available in PMC 2010 June 1. NIH Public Access Author Manuscript Published in final edited form as: Best Pract Res Clin Haematol. 2009 June ; 22(2): 271 282. doi:10.1016/j.beha.2009.07.001. Analysis of DNA Microarray Expression Data

More information

REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE

REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE 1 Biomarker discovery has opened new realms in the medical industry, from patient diagnosis and

More information

Comparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes

Comparing Multifunctionality and Association Information when Classifying Oncogenes and Tumor Suppressor Genes 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

Machine Gaydar : Using Facebook Profiles to Predict Sexual Orientation

Machine Gaydar : Using Facebook Profiles to Predict Sexual Orientation Machine Gaydar : Using Facebook Profiles to Predict Sexual Orientation Nikhil Bhattasali 1, Esha Maiti 2 Mentored by Sam Corbett-Davies Stanford University, Stanford, California 94305, USA ABSTRACT The

More information

Accuracy-Rejection Curves (ARCs) for Comparing Classification Methods with a Reject Option

Accuracy-Rejection Curves (ARCs) for Comparing Classification Methods with a Reject Option JMLR: Workshop and Conference Proceedings 8: 65-81 Machine Learning in Systems Biology Accuracy-Rejection Curves (ARCs) for Comparing Classification Methods with a Reject Option Malik Sajjad Ahmed Nadeem

More information

A Biclustering Based Classification Framework for Cancer Diagnosis and Prognosis

A Biclustering Based Classification Framework for Cancer Diagnosis and Prognosis A Biclustering Based Classification Framework for Cancer Diagnosis and Prognosis Baljeet Malhotra and Guohui Lin Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2E8

More information

Package golubesets. August 16, 2014

Package golubesets. August 16, 2014 Package golubesets August 16, 2014 Version 1.6.0 Title exprsets for golub leukemia data Author Todd Golub Maintainer Vince Carey Description representation

More information

Commentary The promise of microarrays in the management and treatment of breast cancer Jenny C Chang, Susan G Hilsenbeck and Suzanne AW Fuqua

Commentary The promise of microarrays in the management and treatment of breast cancer Jenny C Chang, Susan G Hilsenbeck and Suzanne AW Fuqua Commentary The promise of microarrays in the management and treatment of breast cancer Jenny C Chang, Susan G Hilsenbeck and Suzanne AW Fuqua Breast Center, Baylor College of Medicine, Houston, Texas,

More information

Analyzing Gene Expression Data: Fuzzy Decision Tree Algorithm applied to the Classification of Cancer Data

Analyzing Gene Expression Data: Fuzzy Decision Tree Algorithm applied to the Classification of Cancer Data Analyzing Gene Expression Data: Fuzzy Decision Tree Algorithm applied to the Classification of Cancer Data Simone A. Ludwig Department of Computer Science North Dakota State University Fargo, ND, USA simone.ludwig@ndsu.edu

More information

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining

More information

FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION

FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION SOMAYEH ABBASI, HAMID MAHMOODIAN Department of Electrical Engineering, Najafabad branch, Islamic

More information

A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system

A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system Technology and Health Care 24 (2016) S601 S605 DOI 10.3233/THC-161187 IOS Press S601 A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system Jongwoo Lim, Bohyun

More information

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information

Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang

Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang Abstract: Unlike most cancers, thyroid cancer has an everincreasing incidence rate

More information

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets. Supplementary Figure 1 Transcriptional program of the TE and MP CD8 + T cell subsets. (a) Comparison of gene expression of TE and MP CD8 + T cell subsets by microarray. Genes that are 1.5-fold upregulated

More information

Hybridized KNN and SVM for gene expression data classification

Hybridized KNN and SVM for gene expression data classification Mei, et al, Hybridized KNN and SVM for gene expression data classification Hybridized KNN and SVM for gene expression data classification Zhen Mei, Qi Shen *, Baoxian Ye Chemistry Department, Zhengzhou

More information

Predictive Biomarkers

Predictive Biomarkers Uğur Sezerman Evolutionary Selection of Near Optimal Number of Features for Classification of Gene Expression Data Using Genetic Algorithms Predictive Biomarkers Biomarker: A gene, protein, or other change

More information

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm:

A quick review. The clustering problem: Hierarchical clustering algorithm: Many possible distance metrics K-mean clustering algorithm: The clustering problem: partition genes into distinct sets with high homogeneity and high separation Hierarchical clustering algorithm: 1. Assign each object to a separate cluster. 2. Regroup the pair

More information

Multiclass microarray data classification based on confidence evaluation

Multiclass microarray data classification based on confidence evaluation Methodology Multiclass microarray data classification based on confidence evaluation H.L. Yu 1, S. Gao 1, B. Qin 1 and J. Zhao 2 1 School of Computer Science and Engineering, Jiangsu University of Science

More information

Inter-session reproducibility measures for high-throughput data sources

Inter-session reproducibility measures for high-throughput data sources Inter-session reproducibility measures for high-throughput data sources Milos Hauskrecht, PhD, Richard Pelikan, MSc Computer Science Department, Intelligent Systems Program, Department of Biomedical Informatics,

More information

Investigating the performance of a CAD x scheme for mammography in specific BIRADS categories

Investigating the performance of a CAD x scheme for mammography in specific BIRADS categories Investigating the performance of a CAD x scheme for mammography in specific BIRADS categories Andreadis I., Nikita K. Department of Electrical and Computer Engineering National Technical University of

More information

Re-evaluating Early Breast Neoplasia

Re-evaluating Early Breast Neoplasia Re-evaluating Early Breast Neoplasia The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Moulis, Sharon and Dennis C. Sgroi.

More information

Predicting Kidney Cancer Survival from Genomic Data

Predicting Kidney Cancer Survival from Genomic Data Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi

More information

MUTUAL INFORMATION ANALYSIS AS A TOOL TO ASSESS THE ROLE OF ANEUPLOIDY IN THE GENERATION OF CANCER-ASSOCIATED DIFFERENTIAL GENE EXPRESSION PATTERNS

MUTUAL INFORMATION ANALYSIS AS A TOOL TO ASSESS THE ROLE OF ANEUPLOIDY IN THE GENERATION OF CANCER-ASSOCIATED DIFFERENTIAL GENE EXPRESSION PATTERNS MUTUAL INFORMATION ANALYSIS AS A TOOL TO ASSESS THE ROLE OF ANEUPLOIDY IN THE GENERATION OF CANCER-ASSOCIATED DIFFERENTIAL GENE EXPRESSION PATTERNS GREGORY T. KLUS @, ANDREW SONG @, ARI SCHICK @, MATTIAS

More information

AUTHOR PROOF COPY ONLY

AUTHOR PROOF COPY ONLY REVIEW Ensemble machine learning on gene expression data for cancer classification Aik Choon Tan and David Gilbert Bioinformatics Research Centre, Department of Computing Science, University of Glasgow,

More information

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M.

Gene expression profiling predicts clinical outcome of prostate cancer. Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M. SUPPLEMENTARY DATA Gene expression profiling predicts clinical outcome of prostate cancer Gennadi V. Glinsky, Anna B. Glinskii, Andrew J. Stephenson, Robert M. Hoffman, William L. Gerald Table of Contents

More information

Biomarker adaptive designs in clinical trials

Biomarker adaptive designs in clinical trials Review Article Biomarker adaptive designs in clinical trials James J. Chen 1, Tzu-Pin Lu 1,2, Dung-Tsa Chen 3, Sue-Jane Wang 4 1 Division of Bioinformatics and Biostatistics, National Center for Toxicological

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 2 no. 4 25, pages 34 32 doi:.93/bioinformatics/bti483 Gene expression Ensemble dependence model for classification and prediction of cancer and normal gene expression

More information

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection 202 4th International onference on Bioinformatics and Biomedical Technology IPBEE vol.29 (202) (202) IASIT Press, Singapore Efficacy of the Extended Principal Orthogonal Decomposition on DA Microarray

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the

More information

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach) High-Throughput Sequencing Course Gene-Set Analysis Biostatistics and Bioinformatics Summer 28 Section Introduction What is Gene Set Analysis? Many names for gene set analysis: Pathway analysis Gene set

More information

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers Kai-Ming Jiang 1,2, Bao-Liang Lu 1,2, and Lei Xu 1,2,3(&) 1 Department of Computer Science and Engineering,

More information

Comparison of Triple Negative Breast Cancer between Asian and Western Data Sets

Comparison of Triple Negative Breast Cancer between Asian and Western Data Sets 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops Comparison of Triple Negative Breast Cancer between Asian and Western Data Sets Lee H. Chen Bioinformatics and Biostatistics

More information

Visualizing Cancer Heterogeneity with Dynamic Flow

Visualizing Cancer Heterogeneity with Dynamic Flow Visualizing Cancer Heterogeneity with Dynamic Flow Teppei Nakano and Kazuki Ikeda Keio University School of Medicine, Tokyo 160-8582, Japan keiohigh2nd@gmail.com Department of Physics, Osaka University,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science

More information

Survey on Breast Cancer Analysis using Machine Learning Techniques

Survey on Breast Cancer Analysis using Machine Learning Techniques Survey on Breast Cancer Analysis using Machine Learning Techniques Prof Tejal Upadhyay 1, Arpita Shah 2 1 Assistant Professor, Information Technology Department, 2 M.Tech, Computer Science and Engineering,

More information

Using CART to Mine SELDI ProteinChip Data for Biomarkers and Disease Stratification

Using CART to Mine SELDI ProteinChip Data for Biomarkers and Disease Stratification Using CART to Mine SELDI ProteinChip Data for Biomarkers and Disease Stratification Kenna Mawk, D.V.M. Informatics Product Manager Ciphergen Biosystems, Inc. Outline Introduction to ProteinChip Technology

More information

Original Article Identification of meningioma recurrence gene expression signature by DNA microarray experiments

Original Article Identification of meningioma recurrence gene expression signature by DNA microarray experiments Int J Clin Exp Med 2016;9(2):4168-4172 www.ijcem.com /ISSN:1940-5901/IJCEM0017450 Original Article Identification of meningioma recurrence gene expression signature by DNA microarray experiments Feng Chen

More information

Classification with microarray data

Classification with microarray data Classification with microarray data Aron Charles Eklund eklund@cbs.dtu.dk DNA Microarray Analysis - #27612 January 8, 2010 The rest of today Now: What is classification, and why do we do it? How to develop

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

Cancer is the fourth most common disease and the. Genomic Processing for Cancer Classification and Prediction

Cancer is the fourth most common disease and the. Genomic Processing for Cancer Classification and Prediction [ Peng Qiu, Z. Jane Wang, and K.J. Ray Liu ] Genomic Processing for Cancer Classification and Prediction [A broad review of the recent advances in model-based genomic and proteomic signal processing for

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

FISH mcgh Karyotyping ISH RT-PCR. Expression arrays RNA. Tissue microarrays Protein arrays MS. Protein IHC

FISH mcgh Karyotyping ISH RT-PCR. Expression arrays RNA. Tissue microarrays Protein arrays MS. Protein IHC Classification of Breast Cancer in the Molecular Era Susan J. Done University Health Network, Toronto Why classify? Prognosis Prediction of response to therapy Pathogenesis Invasive breast cancer can have

More information

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK

DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK CHAPTER 6 DOES THE BRCAX GENE EXIST? FUTURE OUTLOOK Genetic research aimed at the identification of new breast cancer susceptibility genes is at an interesting crossroad. On the one hand, the existence

More information

Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE

Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE Peng Qiu1,4, Erin F. Simonds2, Sean C. Bendall2, Kenneth D. Gibbs Jr.2, Robert V. Bruggner2, Michael

More information

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis DSC 4/5 Multivariate Statistical Methods Applications DSC 4/5 Multivariate Statistical Methods Discriminant Analysis Identify the group to which an object or case (e.g. person, firm, product) belongs:

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

Diagnosis of multiple cancer types by shrunken centroids of gene expression

Diagnosis of multiple cancer types by shrunken centroids of gene expression Diagnosis of multiple cancer types by shrunken centroids of gene expression Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu PNAS 99:10:6567-6572, 14 May 2002 Nearest Centroid

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

SUPPLEMENTARY APPENDIX

SUPPLEMENTARY APPENDIX SUPPLEMENTARY APPENDIX 1) Supplemental Figure 1. Histopathologic Characteristics of the Tumors in the Discovery Cohort 2) Supplemental Figure 2. Incorporation of Normal Epidermal Melanocytic Signature

More information

BIOINFORMATICS. A randomized steiner tree approach for biomarker discovery and classification of breast cancer metastasis

BIOINFORMATICS. A randomized steiner tree approach for biomarker discovery and classification of breast cancer metastasis BIOINFORMATICS Vol. 00 no. 00 2005 Pages 1 8 A randomized steiner tree approach for biomarker discovery and classification of breast cancer metastasis Md. Jamiul Jahid 1 and Jianhua Ruan 1 1 Department

More information

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,

More information

Seeing is Behaving : Using Revealed-Strategy Approach to Understand Cooperation in Social Dilemma. February 04, Tao Chen. Sining Wang.

Seeing is Behaving : Using Revealed-Strategy Approach to Understand Cooperation in Social Dilemma. February 04, Tao Chen. Sining Wang. Seeing is Behaving : Using Revealed-Strategy Approach to Understand Cooperation in Social Dilemma February 04, 2019 Tao Chen Wan Wang Sining Wang Lei Chen University of Waterloo University of Waterloo

More information

Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology

Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology Identifying the Zygosity Status of Twins Using Bayes Network and Estimation- Maximization Methodology Yicun Ni (ID#: 9064804041), Jin Ruan (ID#: 9070059457), Ying Zhang (ID#: 9070063723) Abstract As the

More information

Network-based biomarkers enhance classical approaches to prognostic gene expression signatures

Network-based biomarkers enhance classical approaches to prognostic gene expression signatures RESEARCH Open Access Network-based biomarkers enhance classical approaches to prognostic gene expression signatures Rebecca L Barter 1, Sarah-Jane Schramm 2,3, Graham J Mann 2,3, Yee Hwa Yang 1,3* From

More information

L. Ziaei MS*, A. R. Mehri PhD**, M. Salehi PhD***

L. Ziaei MS*, A. R. Mehri PhD**, M. Salehi PhD*** Received: 1/16/2004 Accepted: 8/1/2005 Original Article Application of Artificial Neural Networks in Cancer Classification and Diagnosis Prediction of a Subtype of Lymphoma Based on Gene Expression Profile

More information

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/ CSCI1950 Z Computa4onal Methods for Biology Lecture 18 Ben Raphael April 8, 2009 hip://cs.brown.edu/courses/csci1950 z/ Binary classifica,on Given a set of examples (x i, y i ), where y i = + 1, from unknown

More information

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used

More information

A novel approach to feature extraction from classification models based on information gene pairs

A novel approach to feature extraction from classification models based on information gene pairs Pattern Recognition 41 (2008) 1975 1984 www.elsevier.com/locate/pr A novel approach to feature extraction from classification models based on information gene pairs J. Li, X. Tang, J. Liu, J. Huang, Y.

More information

AQCHANALYTICAL TUTORIAL ARTICLE. Classification in Karyometry HISTOPATHOLOGY. Performance Testing and Prediction Error

AQCHANALYTICAL TUTORIAL ARTICLE. Classification in Karyometry HISTOPATHOLOGY. Performance Testing and Prediction Error AND QUANTITATIVE CYTOPATHOLOGY AND AQCHANALYTICAL HISTOPATHOLOGY An Official Periodical of The International Academy of Cytology and the Italian Group of Uropathology Classification in Karyometry Performance

More information

Microarray Analysis and Liver Diseases

Microarray Analysis and Liver Diseases Microarray Analysis and Liver Diseases Snorri S. Thorgeirsson M.D., Ph.D. Laboratory of Experimental Carcinogenesis Center for Cancer Research, NCI, NIH Application of Microarrays to Cancer Research Identifying

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

Facial Expression Biometrics Using Tracker Displacement Features

Facial Expression Biometrics Using Tracker Displacement Features Facial Expression Biometrics Using Tracker Displacement Features Sergey Tulyakov 1, Thomas Slowe 2,ZhiZhang 1, and Venu Govindaraju 1 1 Center for Unified Biometrics and Sensors University at Buffalo,

More information

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Tong WW, McComb ME, Perlman DH, Huang H, O Connor PB, Costello

More information

Analysis of breast cancer progression using principal component analysis and clustering

Analysis of breast cancer progression using principal component analysis and clustering Analysis of breast cancer progression using principal component analysis and clustering 1027 Analysis of breast cancer progression using principal component analysis and clustering G ALEXE 1,2,*, G S DALGIN

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Pan-cancer analysis of global and local DNA methylation variation a) Variations in global DNA methylation are shown as measured by averaging the genome-wide

More information

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23: Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:7332-7341 Presented by Deming Mi 7/25/2006 Major reasons for few prognostic factors to

More information

Cancer outlier differential gene expression detection

Cancer outlier differential gene expression detection Biostatistics (2007), 8, 3, pp. 566 575 doi:10.1093/biostatistics/kxl029 Advance Access publication on October 4, 2006 Cancer outlier differential gene expression detection BAOLIN WU Division of Biostatistics,

More information

Title: Pathway-Based Classification of Cancer Subtypes

Title: Pathway-Based Classification of Cancer Subtypes Title: Pathway-Based Classification of Cancer Subtypes Running title: Pathway-based classification of cancer subtypes Shinuk Kim 1, Mark Kon 1,2*, Charles DeLisi 1 1 Bioinformatics program, Boston University,

More information

Clustering analysis of cancerous microarray data

Clustering analysis of cancerous microarray data Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(9): 488-493 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Clustering analysis of cancerous microarray data

More information

The LiquidAssociation Package

The LiquidAssociation Package The LiquidAssociation Package Yen-Yi Ho October 30, 2018 1 Introduction The LiquidAssociation package provides analytical methods to study three-way interactions. It incorporates methods to examine a particular

More information