Cancer Gene Extraction Based on Stepwise Regression

Size: px
Start display at page:

Download "Cancer Gene Extraction Based on Stepwise Regression"

Transcription

1 Mathematical Computation Volume 5, 2016, PP.6-10 Cancer Gene Extraction Based on Stepwise Regression Jie Ni 1, Fan Wu 1, Meixiang Jin 1, Yixing Bai 1, Yunfei Guo 1 1. Mathematics Department, Yanbian University, , China guoyunfei0413@sina.com Abstract With the expansion of the gene expression profile database, in the case of as little as possible to lose information or to retain the most critical information, gene extraction has become a main direction for the scholars. This paper excludes 1561 irrelevant genes through the definition of weighted distance firstly, and then removes 252 redundant genes by Pearson's correlation coefficient. Finally by comparing the two methods, stepwise regression after clustering and only stepwise analysis, we obtain the best combination of 8 genes. Keywords: stepwise regression, cluster analysis, gene extraction 1 INTRODUCTION Golub studied two subtypes of leukemia, using "signal to noise ratio" to evaluate the effect of classification genes. 50 genes were selected as features and classified by the method of weighted voting in order to achieve good results. Ramaswamy used SVM and RFE feature selection method, a classification of the 14 tumor samples were carried out. Li Yingxin used the sequential floating search algorithm to search for the feature subset, and obtained 29 feature genes. Institute of Semiconductors, Chinese Academy of Sciences academician Wang Shoujue, who thought of the first two feature gene extraction method, did not consider the correlation between the selected genes, which will affect the classification results in a certain extent, the ideal gene model is proposed on the basis of correcting the deficiency. This paper combines the score of the ideal gene model and the variance which reacts the amount of gene information, in order to define the weighted distance, then uses the Pearson correlation coefficient, finally gets the 97 information genes. We apply stepwise regression method for variable selection to these 97 informative genes in different situations, and finally get the ideal combination of gene. 2 ELIMINATION OF IRRELEVANT GENES AND REDUNDANT GENES 2.1 Elimination Irrelevant Genes First, after deleting the repeated items of gene data, we got 1910 mutually different genes. The remaining 1910 gene expression profiles can be expressed by a matrix X [ M ij ], X ij indicates the level of gene expression values i-th gene in the j-th sample. When the amount of experimental samples is n, the genes that affect the cancer have p, we obtain the gene expression profile matrix X is a n*p order matrix. (Data from the 2010 National Graduate mathematical modeling contest A Title data.) In order to make better comparison, we first normalize the gene data, the model is stated as follows: 2 mij mmax mmin m ij, i [0,1910], j [0,] m m max min where, mmax, mmin are the largest and the smallest elements in the expression profile matrix X, m ij is the value of 6

2 the normalized m. ij The data we used have a total of samples, there are 2000 genes in each sample, and the first 22 samples are from the normal people, and the later 40 samples are from the cancer patients., we put two samples together, which can be expressed by y1, y2,, y. The expression level of gene v in each sample was regarded as a vector of1*, denoted by Mv { v1, v2, v}. 1) Defining an ideal gene: s 1 s 2 s 22 1, s 23 s 24 s 1 If the gene x plays a role in cancer,then it can either promote or inhibit. So the more cancer information the X carries, the smaller the angle (Acute angle) between it and the ideal gene is, and the absolute value of the cosine value closer to 1. 2) Defining the Angle Between the Gene v and the Ideal Gene s : M M cos, v s 2 2 MV vi Ms si Mv Ms i1 i1 3) Defining the Distance Between the Gene v and the Ideal Gene s : d 2 vi ei i 1 2 vi ei i1 ( ),cos 0 ( ),cos 0 4) Defining the Similarity Between the Gene v and the Ideal Gene s : 5) Calculate the Standard Deviation of Each Gene STD v 6) Defining Weighted Distance: D 0.5* S 0.5* STD v S cos d We can calculate the weighted distance between the 1910 genes and the ideal gene according to the formula. According to the score of S and STD v, we can define a threshold, which can be used to distinguish the informative genes and irrelevant genes. This threshold can be defined as D I, D v DN, D According to our definition of the irrelevant and informative gene, we can see from the following table, when threshold 0.11, card ( D ) 349, which means, in a total of 1910 genes, informative genes have 349, and irrelevant genes have The 349 genes in the basis of the further analysis. 2.2 Elimination of Redundant Genes I S I has different degrees of correlation with the ideal gene, which is TABLE 1 DISTRIBUTION OF GENE AND IDEAL GENE distance count proportion % % % % > % In the previous section, we remove the irrelevant genes from the relationship between genes and cancer. However, 7

3 different genes in the level of expression have a certain degree of relevance, so we start to exclude redundant genes. Defining Pearson correlation coefficient: corrcoef ( v, v ) i j k1 ( m m )( m m ) vik vi vjk vj 2 2 ( mvik mvi ) ( mvjk mvj ) k1 k1 where, m, m are the expression values of gene v, v in k th sample in matrix X, m, m vik vjk i j This paper selects the threshold of 0.7; we further reduce the number of informative gene, finally get 97 genes. vi vj TABLE 2 THE NUMBER OF CLASSIFICATION FEATURE GENES UNDER DIFFERENT THRESHOLD Threshold Num of genes VARIABLE SELECTION BASED ON STEPWISE REGRESSION AFTER CLUSTERING After the removal of irrelevant genes and redundant genes two steps, we get 97 information genes, the amount is still big. However, we can determine that most of these genes contain certain cancer information and their correlation is not strong, the following we use a stepwise regression method to select the variables. TABLE 3 GENES SELECTED FROM EACH CLUSTER Cluster Gene 1 M26383 M31516 R54097 X02492 X52228 T T47377 X02761 X M63391 U R none FIG.1 GROUP1:12 GENES(LEFT)GROUP2:5 GENES(RIGHT) 8

4 First of all, we use clustering analysis (set clustering number 5), and then we give the function value, that is, the function value of the first 22 normal samples is 1, after 40 cancer samples of the function value is 0. After clustering, we do the stepwise regression in each cluster respectively (set variables to enter and stay in the model of the significant level respectively sle 0.05, sls ), and thus to screen out the variables that meet the requirements. The 12 genes are combined to a set of genes, the lowest significant level s gene in each cluster are combined into another group of genes. The samples are divided into training set (12+10) and test set (10+20), the two sets of genes are analyzed by discriminant analysis, posterior probability as shown below. From Figure 1, we can see that the first group`s misjudged number is seven, and the misjudged number of second group is six. So after clustering, choosing a gene from each cluster is adequate. When we increase the amount of genes selected from a cluster, at the same time we increase the misjudge rate. 4 VARIABLE SELECTION BASED ON DIRECT STEPWISE REGRESSION FIG.2 VARIABLE SELECTION RESULTS TABLE 4 X CORRESPONDING GENE x2 x22 x27 x36 x47 x51 x56 T47377 M63391 X12671 R99907 X02492 H20503 M59807 FIG.3 POSTERIOR PROBABILITY 9

5 We directly give function values, namely the function values of first 22 normal samples are 1, after 40 cancer samples values are 0; then, we do stepwise regression for the 97 variables ( sle 0.05, sls ). The samples are divided into training set (12+10) and test set (10+20), the above six variables have been selected for discriminant analysis, and the following results are obtained. As shown in Fig.3, among the 30 test samples, there are 4 errors in judgment,and misjudged rate is 13.3%, which can be accepted, so we get the final gene combination: M63391 X12671 R99907 X02492 H20503 T47377 M CONCLUSIONS Through the application of the weighted distance and correlation coefficient two methods, we delete 1813 irrelevant and redundant genes, greatly reducing the number of genes and dimension of analysis. These 97 genes have low correlation with each other, and we adopt two different stepwise regression to get different results. Through the analysis of 3 and 4, we find that in section 3, each cluster screening one gene can get a better misjudge rate; compared with stepwise regression after clustering method, stepwise regression directly can get lower misjudge rate. So we can draw a conclusion: when we use stepwise regression in different cluster, it seems that the results can be more reasonable than stepwise directly, but we ignore that this way will destroy the structure among original variables, which will lead to the unsatisfactory results. REFERENCES [1] WANG SHOU-JUE, ZHOU LING-FEl. Gene Selection for Gene Expression Data Analysis, CONTROL & AUTOMATION, Vol 24,2008. [2] LI Ying-xin, RUAN Xiao-gang. Cancer Subtype Recognition and Feature Selection with Gene Expression Profiles, ACTA ELECTRONICA SINICA,Vol 33,2005. [3] Golub, T.R., Slonim, D.K., Tamayo, P., et al. Molecular classification of cancer:class discovery and class prediction by gene expression monitoring[j]. Science, 1999, 286: [4] Ramaswamy, S., Tamayo, P., et al. Multiclass cancer diagnosis using tumour gene expression signatures[j]. PNAS, 2001, 98 : [5] WANG SHOU-JUE. Direction-Basis-Function neural networks, IJCNN 99, 1999: [6] KAN Haijun, TANG Jun*,SU Liangliang, A method for informative gene selection using neighborhood uncertainty and scoring criteria, Journal of Anhui University ( Natural Science Edition) Vol.38 No [7] Wang Jingqi, Xu Linli, Semi supervised feature selection and clustering for multi view data[j]. Journal of Data Acquisition and Processing,2015,30(1): [8] Xu Jiucheng, Xu Tianhe, Sun Lin,et a1. Feature selection for cancer classification based on neighborhood rough set and par ticle swarm optimization[j]. Journal of Chinese Computer Systems.2014,35(11): [9] Xu Jiuchen,Li Tao,Sun Linl,Li Yuhuil, Feature Gene Selection Based on SNR and Neighborhood Rough Set, Journal of Data Acquisition and Processing V01.30.No.5,Sep.2015,PP [10] Ra maswamy S.Golub T R.DNA Mi croar rays in Clinical Oncolog y[ J].Journal of Clinical Oncology,2 002; 20( 7): [11] Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines. Machine Learning, 46(13): , 2000 AUTHORS 1 Jie Ni was born on October 21th, 1995 in Anhui province, she is a junior student in Yanbian University majored in statistic. 2 Yunfei Guo was born on April 13th, 1983 in Jilin province, and received his M.S. degrees in Yanbian University, China in He is a Lecture of Yanbian University. His research interests are reliability and statistical anal 10

A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION

A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION 5-9 JATIT. All rights reserved. A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION 1 H. Mahmoodian, M. Hamiruce Marhaban, 3 R. A. Rahim, R. Rosli, 5 M. Iqbal Saripan 1 PhD student, Department

More information

Nearest Shrunken Centroid as Feature Selection of Microarray Data

Nearest Shrunken Centroid as Feature Selection of Microarray Data Nearest Shrunken Centroid as Feature Selection of Microarray Data Myungsook Klassen Computer Science Department, California Lutheran University 60 West Olsen Rd, Thousand Oaks, CA 91360 mklassen@clunet.edu

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers Kai-Ming Jiang 1,2, Bao-Liang Lu 1,2, and Lei Xu 1,2,3(&) 1 Department of Computer Science and Engineering,

More information

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience Introduction to Computational Neuroscience Lecture 5: Data analysis II Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single

More information

Identification of Neuroimaging Biomarkers

Identification of Neuroimaging Biomarkers Identification of Neuroimaging Biomarkers Dan Goodwin, Tom Bleymaier, Shipra Bhal Advisor: Dr. Amit Etkin M.D./PhD, Stanford Psychiatry Department Abstract We present a supervised learning approach to

More information

Classification of cancer profiles. ABDBM Ron Shamir

Classification of cancer profiles. ABDBM Ron Shamir Classification of cancer profiles 1 Background: Cancer Classification Cancer classification is central to cancer treatment; Traditional cancer classification methods: location; morphology, cytogenesis;

More information

A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system

A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system Technology and Health Care 24 (2016) S601 S605 DOI 10.3233/THC-161187 IOS Press S601 A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system Jongwoo Lim, Bohyun

More information

Diagnosis of multiple cancer types by shrunken centroids of gene expression

Diagnosis of multiple cancer types by shrunken centroids of gene expression Diagnosis of multiple cancer types by shrunken centroids of gene expression Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu PNAS 99:10:6567-6572, 14 May 2002 Nearest Centroid

More information

Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science

Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science Abstract One method for analyzing pediatric B cell leukemia is to categorize

More information

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

SCIENCE & TECHNOLOGY

SCIENCE & TECHNOLOGY Pertanika J. Sci. & Technol. 25 (S): 241-254 (2017) SCIENCE & TECHNOLOGY Journal homepage: http://www.pertanika.upm.edu.my/ Fuzzy Lambda-Max Criteria Weight Determination for Feature Selection in Clustering

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the

More information

Study on rowing athlete selection potential based on stepwise regression analysis

Study on rowing athlete selection potential based on stepwise regression analysis Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(5):1896-1903 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Study on rowing athlete selection potential based

More information

Colon cancer subtypes from gene expression data

Colon cancer subtypes from gene expression data Colon cancer subtypes from gene expression data Nathan Cunningham Giuseppe Di Benedetto Sherman Ip Leon Law Module 6: Applied Statistics 26th February 2016 Aim Replicate findings of Felipe De Sousa et

More information

A Semi-supervised Approach to Perceived Age Prediction from Face Images

A Semi-supervised Approach to Perceived Age Prediction from Face Images IEICE Transactions on Information and Systems, vol.e93-d, no.10, pp.2875 2878, 2010. 1 A Semi-supervised Approach to Perceived Age Prediction from Face Images Kazuya Ueki NEC Soft, Ltd., Japan Masashi

More information

Heterogeneous Data Mining for Brain Disorder Identification. Bokai Cao 04/07/2015

Heterogeneous Data Mining for Brain Disorder Identification. Bokai Cao 04/07/2015 Heterogeneous Data Mining for Brain Disorder Identification Bokai Cao 04/07/2015 Outline Introduction Tensor Imaging Analysis Brain Network Analysis Davidson et al. Network discovery via constrained tensor

More information

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi

More information

Predicting Kidney Cancer Survival from Genomic Data

Predicting Kidney Cancer Survival from Genomic Data Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United

More information

Diagnosis Of Ovarian Cancer Using Artificial Neural Network

Diagnosis Of Ovarian Cancer Using Artificial Neural Network Diagnosis Of Ovarian Cancer Using Artificial Neural Network B.Rosiline Jeetha #1, M.Malathi *2 1 Research Supervisor, 2 Research Scholar, Assistant Professor RVS College of Arts And Science Department

More information

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection 202 4th International onference on Bioinformatics and Biomedical Technology IPBEE vol.29 (202) (202) IASIT Press, Singapore Efficacy of the Extended Principal Orthogonal Decomposition on DA Microarray

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

A STATISTICAL PATTERN RECOGNITION PARADIGM FOR VIBRATION-BASED STRUCTURAL HEALTH MONITORING

A STATISTICAL PATTERN RECOGNITION PARADIGM FOR VIBRATION-BASED STRUCTURAL HEALTH MONITORING A STATISTICAL PATTERN RECOGNITION PARADIGM FOR VIBRATION-BASED STRUCTURAL HEALTH MONITORING HOON SOHN Postdoctoral Research Fellow ESA-EA, MS C96 Los Alamos National Laboratory Los Alamos, NM 87545 CHARLES

More information

Brain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine

Brain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Brain Tumour Detection of MR Image Using Naïve

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

Gene-microRNA network module analysis for ovarian cancer

Gene-microRNA network module analysis for ovarian cancer Gene-microRNA network module analysis for ovarian cancer Shuqin Zhang School of Mathematical Sciences Fudan University Oct. 4, 2016 Outline Introduction Materials and Methods Results Conclusions Introduction

More information

Research Article Detection of Gastric Cancer with Fourier Transform Infrared Spectroscopy and Support Vector Machine Classification

Research Article Detection of Gastric Cancer with Fourier Transform Infrared Spectroscopy and Support Vector Machine Classification BioMed Research International Volume 213, Article ID 942427, 4 pages http://dx.doi.org/1.1155/213/942427 Research Article Detection of Gastric Cancer with Fourier Transform Infrared Spectroscopy and Support

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data Dhouha Grissa, Mélanie Pétéra, Marion Brandolini, Amedeo Napoli, Blandine Comte and Estelle Pujos-Guillot

More information

How to Build the Management Mode for the Gymnasiums in Ordinary Universities in China

How to Build the Management Mode for the Gymnasiums in Ordinary Universities in China Journal of Sports Science 4 (2016) 226-231 doi: 10.17265/2332-7839/2016.04.006 D DAVID PUBLISHING How to Build the Management Mode for the Gymnasiums in in China Fengquan Yu Sports Sociology and Humanities,

More information

International Journal of Pure and Applied Mathematics

International Journal of Pure and Applied Mathematics Volume 119 No. 12 2018, 12505-12513 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Analysis of Cancer Classification of Gene Expression Data A Scientometric Review 1 Joseph M. De Guia,

More information

Enterovirus 71 Outbreak in P. R. China, 2008

Enterovirus 71 Outbreak in P. R. China, 2008 JCM Accepts, published online ahead of print on 13 May 2009 J. Clin. Microbiol. doi:10.1128/jcm.00563-09 Copyright 2009, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights

More information

A Hybrid Approach for Mining Metabolomic Data

A Hybrid Approach for Mining Metabolomic Data A Hybrid Approach for Mining Metabolomic Data Dhouha Grissa 1,3, Blandine Comte 1, Estelle Pujos-Guillot 2, and Amedeo Napoli 3 1 INRA, UMR1019, UNH-MAPPING, F-63000 Clermont-Ferrand, France, 2 INRA, UMR1019,

More information

Data analysis in microarray experiment

Data analysis in microarray experiment 16 1 004 Chinese Bulletin of Life Sciences Vol. 16, No. 1 Feb., 004 1004-0374 (004) 01-0041-08 100005 Q33 A Data analysis in microarray experiment YANG Chang, FANG Fu-De * (National Laboratory of Medical

More information

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012 Practical Bayesian Optimization of Machine Learning Algorithms Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012 ... (Gaussian Processes) are inadequate for doing speech and vision. I still think they're

More information

Knowledge Discovery and Data Mining I

Knowledge Discovery and Data Mining I Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining Prof. Dr. Thomas Seidl Knowledge Discovery and Data Mining I Winter Semester 2018/19 Introduction What is an outlier?

More information

An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM)

An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM) www.ijcsi.org 8 An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM) Shamsan Aljamali 1, Zhang Zuping 2 and Long Jun 3 1 School of Information

More information

Study Guide #2: MULTIPLE REGRESSION in education

Study Guide #2: MULTIPLE REGRESSION in education Study Guide #2: MULTIPLE REGRESSION in education What is Multiple Regression? When using Multiple Regression in education, researchers use the term independent variables to identify those variables that

More information

Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering

Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering Bio-Medical Materials and Engineering 26 (2015) S1059 S1065 DOI 10.3233/BME-151402 IOS Press S1059 Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering Yong Xia

More information

arxiv: v1 [cs.lg] 4 Feb 2019

arxiv: v1 [cs.lg] 4 Feb 2019 Machine Learning for Seizure Type Classification: Setting the benchmark Subhrajit Roy [000 0002 6072 5500], Umar Asif [0000 0001 5209 7084], Jianbin Tang [0000 0001 5440 0796], and Stefan Harrer [0000

More information

LUNG LESION PARENCHYMA SEGMENTATION ALGORITHM FOR CT IMAGES

LUNG LESION PARENCHYMA SEGMENTATION ALGORITHM FOR CT IMAGES Int. J. Chem. Sci.: 14(S3), 2016, 928-932 ISSN 0972-768X www.sadgurupublications.com LUNG LESION PARENCHYMA SEGMENTATION ALGORITHM FOR CT IMAGES R. INDUMATHI *, R. HARIPRIYA and S. AKSHAYA ECE Department,

More information

FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION

FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION SOMAYEH ABBASI, HAMID MAHMOODIAN Department of Electrical Engineering, Najafabad branch, Islamic

More information

Multiclass microarray data classification based on confidence evaluation

Multiclass microarray data classification based on confidence evaluation Methodology Multiclass microarray data classification based on confidence evaluation H.L. Yu 1, S. Gao 1, B. Qin 1 and J. Zhao 2 1 School of Computer Science and Engineering, Jiangsu University of Science

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests

A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Tests David Shin Pearson Educational Measurement May 007 rr0701 Using assessment and research to promote learning Pearson Educational

More information

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review

More information

Vessel wall differences between middle cerebral artery and basilar artery. plaques on magnetic resonance imaging

Vessel wall differences between middle cerebral artery and basilar artery. plaques on magnetic resonance imaging Vessel wall differences between middle cerebral artery and basilar artery plaques on magnetic resonance imaging Peng-Peng Niu, MD 1 ; Yao Yu, MD 1 ; Hong-Wei Zhou, MD 2 ; Yang Liu, MD 2 ; Yun Luo, MD 1

More information

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features American Journal of Applied Sciences 8 (12): 1295-1301, 2011 ISSN 1546-9239 2011 Science Publications Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine

More information

International Core Journal of Engineering Vol.3 No ISSN:

International Core Journal of Engineering Vol.3 No ISSN: The Status of College Counselors' Subjective Well-being and Its Influence on the Occupational Commitment : An Empirical Research based on SPSS Statistical Analysis Wenping Peng Department of Social Sciences,

More information

Classification of Microarray Gene Expression Data

Classification of Microarray Gene Expression Data Classification of Microarray Gene Expression Data Geoff McLachlan Department of Mathematics & Institute for Molecular Bioscience University of Queensland Institute for Molecular Bioscience, University

More information

VARIABLES AND MEASUREMENT

VARIABLES AND MEASUREMENT ARTHUR SYC 204 (EXERIMENTAL SYCHOLOGY) 16A LECTURE NOTES [01/29/16] VARIABLES AND MEASUREMENT AGE 1 Topic #3 VARIABLES AND MEASUREMENT VARIABLES Some definitions of variables include the following: 1.

More information

REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE

REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE 1 Biomarker discovery has opened new realms in the medical industry, from patient diagnosis and

More information

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis DSC 4/5 Multivariate Statistical Methods Applications DSC 4/5 Multivariate Statistical Methods Discriminant Analysis Identify the group to which an object or case (e.g. person, firm, product) belongs:

More information

Lecture #4: Overabundance Analysis and Class Discovery

Lecture #4: Overabundance Analysis and Class Discovery 236632 Topics in Microarray Data nalysis Winter 2004-5 November 15, 2004 Lecture #4: Overabundance nalysis and Class Discovery Lecturer: Doron Lipson Scribes: Itai Sharon & Tomer Shiran 1 Differentially

More information

Survey of patients CT radiation dose in Jiangsu Province

Survey of patients CT radiation dose in Jiangsu Province Original Article Page 1 of 6 Survey of patients CT radiation dose in Jiangsu Province Yuanyuan Zhou 1, Chunyong Yang 1, Xingjiang Cao 1, Xiang Du 1, Ningle Yu 1, Xianfeng Zhou 2, Baoli Zhu 1, Jin Wang

More information

Improved Intelligent Classification Technique Based On Support Vector Machines

Improved Intelligent Classification Technique Based On Support Vector Machines Improved Intelligent Classification Technique Based On Support Vector Machines V.Vani Asst.Professor,Department of Computer Science,JJ College of Arts and Science,Pudukkottai. Abstract:An abnormal growth

More information

ANALYSIS AND CLASSIFICATION OF EEG SIGNALS. A Dissertation Submitted by. Siuly. Doctor of Philosophy

ANALYSIS AND CLASSIFICATION OF EEG SIGNALS. A Dissertation Submitted by. Siuly. Doctor of Philosophy UNIVERSITY OF SOUTHERN QUEENSLAND, AUSTRALIA ANALYSIS AND CLASSIFICATION OF EEG SIGNALS A Dissertation Submitted by Siuly For the Award of Doctor of Philosophy July, 2012 Abstract Electroencephalography

More information

Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks

Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks Caio R. Souza 1, Flavio F. Nobre 1, Priscila V.M. Lima 2, Robson M. Silva 2, Rodrigo M. Brindeiro 3, Felipe

More information

Contrastive Analysis on Emotional Cognition of Skeuomorphic and Flat Icon

Contrastive Analysis on Emotional Cognition of Skeuomorphic and Flat Icon Contrastive Analysis on Emotional Cognition of Skeuomorphic and Flat Icon Xiaoming Zhang, Qiang Wang and Yan Shi Abstract In the field of designs of interface and icons, as the skeuomorphism style fades

More information

Liu Jing and Liu Jing Diagnosis System in Classical TCM Discussions of Six Divisions or Six Confirmations Diagnosis System in Classical TCM Texts

Liu Jing and Liu Jing Diagnosis System in Classical TCM Discussions of Six Divisions or Six Confirmations Diagnosis System in Classical TCM Texts Liu Jing and Liu Jing Diagnosis System in Classical TCM Discussions of Six Divisions or Six Confirmations Diagnosis System in Classical TCM Texts Liu Jing Bian Zheng system had developed about 1800 years

More information

Approximate ML Detection Based on MMSE for MIMO Systems

Approximate ML Detection Based on MMSE for MIMO Systems PIERS ONLINE, VOL. 3, NO. 4, 2007 475 Approximate ML Detection Based on MMSE for MIMO Systems Fan Wang 1, Yong Xiong 2, and Xiumei Yang 2 1 The Electromagnetics Academy at Zhejiang University, Zhejiang

More information

On the Combination of Collaborative and Item-based Filtering

On the Combination of Collaborative and Item-based Filtering On the Combination of Collaborative and Item-based Filtering Manolis Vozalis 1 and Konstantinos G. Margaritis 1 University of Macedonia, Dept. of Applied Informatics Parallel Distributed Processing Laboratory

More information

HYBRID SUPPORT VECTOR MACHINE BASED MARKOV CLUSTERING FOR TUMOR DETECTION FROM BIO-MOLECULAR DATA

HYBRID SUPPORT VECTOR MACHINE BASED MARKOV CLUSTERING FOR TUMOR DETECTION FROM BIO-MOLECULAR DATA HYBRID SUPPORT VECTOR MACHINE BASED MARKOV CLUSTERING FOR TUMOR DETECTION FROM BIO-MOLECULAR DATA S. SubashChandraBose 1 and T. Christopher 2 1 Department of Computer Science, PG and Research Department,

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Literature Survey Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is

More information

AUTOMATIC DIABETIC RETINOPATHY DETECTION USING GABOR FILTER WITH LOCAL ENTROPY THRESHOLDING

AUTOMATIC DIABETIC RETINOPATHY DETECTION USING GABOR FILTER WITH LOCAL ENTROPY THRESHOLDING AUTOMATIC DIABETIC RETINOPATHY DETECTION USING GABOR FILTER WITH LOCAL ENTROPY THRESHOLDING MAHABOOB.SHAIK, Research scholar, Dept of ECE, JJT University, Jhunjhunu, Rajasthan, India Abstract: The major

More information

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data

Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Adjusting for mode of administration effect in surveys using mailed questionnaire and telephone interview data Karl Bang Christensen National Institute of Occupational Health, Denmark Helene Feveille National

More information

The Open Access Institutional Repository at Robert Gordon University

The Open Access Institutional Repository at Robert Gordon University OpenAIR@RGU The Open Access Institutional Repository at Robert Gordon University http://openair.rgu.ac.uk This is an author produced version of a paper published in Intelligent Data Engineering and Automated

More information

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21 to March 4, 2011 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:

More information

SEIQR-Network Model with Community Structure

SEIQR-Network Model with Community Structure SEIQR-Network Model with Community Structure S. ORANKITJAROEN, W. JUMPEN P. BOONKRONG, B. WIWATANAPATAPHEE Mahidol University Department of Mathematics, Faculty of Science Centre of Excellence in Mathematics

More information

Minimum Feature Selection for Epileptic Seizure Classification using Wavelet-based Feature Extraction and a Fuzzy Neural Network

Minimum Feature Selection for Epileptic Seizure Classification using Wavelet-based Feature Extraction and a Fuzzy Neural Network Appl. Math. Inf. Sci. 8, No. 3, 129-1300 (201) 129 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.1278/amis/0803 Minimum Feature Selection for Epileptic Seizure

More information

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer Hautaniemi, Sampsa; Ringnér, Markus; Kauraniemi, Päivikki; Kallioniemi, Anne; Edgren, Henrik; Yli-Harja, Olli; Astola,

More information

Dmitriy Fradkin. Ask.com

Dmitriy Fradkin. Ask.com Using cluster analysis to determine the influence of demographic features on medical status of lung cancer patients Dmitriy Fradkin Askcom dmitriyfradkin@askcom Joint work with Dona Schneider (Bloustein

More information

Application of BP and RBF Neural Network in Classification Prognosis of Hepatitis B Virus Reactivation

Application of BP and RBF Neural Network in Classification Prognosis of Hepatitis B Virus Reactivation Journal of Electrical and Electronic Engineering 06; 4(): 35-39 http://www.sciencepublishinggroup.com/j/jeee doi: 0.648/j.jeee.06040.6 ISSN: 39-63 (Print); ISSN: 39-605 (Online) Application of BP and RBF

More information

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION Lu Xugang Li Gang Wang Lip0 Nanyang Technological University, School of EEE, Workstation Resource

More information

Statistical Analysis of Hematoxylin and Eosin Stained Nuclei from Breast Cancer Tissue

Statistical Analysis of Hematoxylin and Eosin Stained Nuclei from Breast Cancer Tissue Statistical Analysis of Hematoxylin and Eosin Stained Nuclei from Breast Cancer Tissue Manuel E. Ruidíaz, Marta Sartor, Sarah L. Blair, Jessica Wang-Rodriguez, Davorka Messmer, Brad Messmer, William C.

More information

Hybridized KNN and SVM for gene expression data classification

Hybridized KNN and SVM for gene expression data classification Mei, et al, Hybridized KNN and SVM for gene expression data classification Hybridized KNN and SVM for gene expression data classification Zhen Mei, Qi Shen *, Baoxian Ye Chemistry Department, Zhengzhou

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Ruifeng Xu, Chengtian Zou, Jun Xu Key Laboratory of Network Oriented Intelligent Computation, Shenzhen Graduate School,

More information

Classification of EEG signals in an Object Recognition task

Classification of EEG signals in an Object Recognition task Classification of EEG signals in an Object Recognition task Iacob D. Rus, Paul Marc, Mihaela Dinsoreanu, Rodica Potolea Technical University of Cluj-Napoca Cluj-Napoca, Romania 1 rus_iacob23@yahoo.com,

More information

University of California Postprints

University of California Postprints University of California Postprints Year 2006 Paper 2444 Analysis of early hypoxia EEG based on a novel chaotic neural network M Hu G Li J J. Li Walter J. Freeman University of California, Berkeley M Hu,

More information

Predictive Biomarkers

Predictive Biomarkers Uğur Sezerman Evolutionary Selection of Near Optimal Number of Features for Classification of Gene Expression Data Using Genetic Algorithms Predictive Biomarkers Biomarker: A gene, protein, or other change

More information

Outlier Analysis. Lijun Zhang

Outlier Analysis. Lijun Zhang Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based

More information

THE data used in this project is provided. SEIZURE forecasting systems hold promise. Seizure Prediction from Intracranial EEG Recordings

THE data used in this project is provided. SEIZURE forecasting systems hold promise. Seizure Prediction from Intracranial EEG Recordings 1 Seizure Prediction from Intracranial EEG Recordings Alex Fu, Spencer Gibbs, and Yuqi Liu 1 INTRODUCTION SEIZURE forecasting systems hold promise for improving the quality of life for patients with epilepsy.

More information

Gene Expression Based Leukemia Sub Classification Using Committee Neural Networks

Gene Expression Based Leukemia Sub Classification Using Committee Neural Networks Bioinformatics and Biology Insights M e t h o d o l o g y Open Access Full open access to this and thousands of other papers at http://www.la-press.com. Gene Expression Based Leukemia Sub Classification

More information

CHAPTER-III METHODOLOGY

CHAPTER-III METHODOLOGY CHAPTER-III METHODOLOGY 3.1 INTRODUCTION This chapter deals with the methodology employed in order to achieve the set objectives of the study. Details regarding sample, description of the tools employed,

More information

1 Pattern Recognition 2 1

1 Pattern Recognition 2 1 1 Pattern Recognition 2 1 3 Perceptrons by M.L. Minsky and S.A. Papert (1969) Books: 4 Pattern Recognition, fourth Edition (Hardcover) by Sergios Theodoridis, Konstantinos Koutroumbas Publisher: Academic

More information

A comparative study of machine learning methods for lung diseases diagnosis by computerized digital imaging'"

A comparative study of machine learning methods for lung diseases diagnosis by computerized digital imaging' A comparative study of machine learning methods for lung diseases diagnosis by computerized digital imaging'" Suk Ho Kang**. Youngjoo Lee*** Aostract I\.' New Work to be 1 Introduction Presented U Mater~al

More information

Tissue Classification Based on Gene Expression Data

Tissue Classification Based on Gene Expression Data Chapter 6 Tissue Classification Based on Gene Expression Data Many diseases result from complex interactions involving numerous genes. Previously, these gene interactions have been commonly studied separately.

More information

Data Mining in Bioinformatics Day 4: Text Mining

Data Mining in Bioinformatics Day 4: Text Mining Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?

More information

CURRICULUM VITAE. Professional Employment and Teaching Experience:

CURRICULUM VITAE. Professional Employment and Teaching Experience: CURRICULUM VITAE Name: Current Address: Hong Chen American Academy of Acupuncture and Oriental Medicine 1925 West County Road B2 Roseville, MN 55113 Tel: (651) 631-0204 Fax: (651) 631-0361 E-mail: hongyu1229@hotmail.com

More information

Data analysis and binary regression for predictive discrimination. using DNA microarray data. (Breast cancer) discrimination. Expression array data

Data analysis and binary regression for predictive discrimination. using DNA microarray data. (Breast cancer) discrimination. Expression array data West Mike of Statistics & Decision Sciences Institute Duke University wwwstatdukeedu IPAM Functional Genomics Workshop November Two group problems: Binary outcomes ffl eg, ER+ versus ER ffl eg, lymph node

More information

The Study on the Relations among Perfectionism & Coping Style & Interpersonal Relationship of University Students

The Study on the Relations among Perfectionism & Coping Style & Interpersonal Relationship of University Students The Study on the Relations among Perfectionism & Coping Style & Interpersonal Relationship of University Students Xiaofeng Zhang Physical Science College, Qufu Normal University Qufu 273165, Shandong,

More information