Evaluation of Gene Selection Using Support Vector Machine Recursive Feature Elimination

Size: px
Start display at page:

Download "Evaluation of Gene Selection Using Support Vector Machine Recursive Feature Elimination"

Transcription

1 Evaluation of Gene Selection Using Support Vector Machine Recursive Feature Elimination Committee: Advisor: Dr. Rosemary Renaut Dr. Adrienne C. Scheck Dr. Kenneth Hoober Dr. Bradford Kirkman-Liff John Huynh

2 Contents Introduction Data Support Vector Machine Feature Selection Hypothesis & Experimental design Result Conclusion Future work Experience Reference

3 Terminology Sample is data set including Gene = feature = attribute = column Example = data point = slide = array = row r_1 r_2 r_i r_m x1 x_11 x_21 x_i1 x_m1 x2 x_12 x_22 x_i2 x_m xj x_1j x_2j x_ij x_mj xn x_1n x_2n x_in x_mn Class c_1 c_2 c_i c_m

4 Meningioma Dr. Adrienne C. Scheck s Lab, BNI (Barrow Neurological Institute) Meningioma: 20% of primary intracranial tumor Mortality/Morbidity: In one series by Coke et al, the overall survival rate for all patients at 5 and 10 years were 87% and 58%, respectively. Medial Sphenoid Wing Meningioma

5 Meningioma Correlating clinical process, microarray, NMR, and FISH with WHO classification grade I, II, and III. Tubercullum sellae meningioma

6 Anatomy Meningioma is tumor of arachnoid.

7 Histology Neuron & Purkinje cell (cerebellum) Neuroglial cells Astrocytes: nurture, support Protoplasmic astrocytes (gray matter) Fibrous astrocytes (white matter) Oligodendrocytes: myelin, support Microglia: immune system in brain Ependymal cells: epithelium Blood vessels

8 Meningioma - Histopathology Meningioma: whorl-like structure + psammoma bodies WHO grade I: benign WHO grade II: (atypical) A meningioma with increased mitotic activity or three or more of the following features: increased cellularity, small cells with high nucleus: cytoplasm ratio, prominent nucleoli, uninterrupted patternless or sheet-like growth, and foci of spontaneous or geographic necrosis. WHO grade III: (anaplastic) A meningioma exhibiting histological features of frank malignancy far in excess of the abnormalities present in atypical meningioma.

9 BNI Meningioma Data Affymetrix HG-U133 Plus 2.0 with 54,675 genes. Small data set with many genes Grade Primary Recurrence Total I II III Total

10 BNI Meningioma Data Plan A: consider data as large data set Plan B: consider data as small data set Grade Train Test I 11 4 II 5 2 Total 16 6 Total

11 BNI Meningioma Data High quality

12 Microarray Gene expression- Microarray Pattern of gene expressions for each tissue Oligo-microarray vs cdna High density Fixed probe length (25) In-situ synthesis

13 Microarray Microarray explores gene expression in global scale. PM & MM

14 Lymphoma Data Amersham cdna microarray with 7129 genes Tissue = bone marrow, blood ALL: acute lymphocytic leukemia AML: acute myelogenous leukemia Incidence: peak 2-3 yrs old: 80/1,000,000; 2400 new/yr/usa, 31% of all cancers ALL AML Total Train Test Total

15 Lymphoma Data Good quality Large sample size, smaller feature dimension

16 Inducer Problem The purpose of learning machine is to find the most accurate classifier by learning in the training set and testing in the testing set. It is the minimizing problem of the error function E in mathematics. Let call f is learning algorithm, data points X = {x1, x2,, xi,, xm} in Rn, target {y1, y2,, yi,, ym} in Y = {-1, +1} f: X Y xi f(xi) E = (yi - f(xi))2. Testing set requirement: the testing set must be never seen in the training process; otherwise the correctness of the testing phase is unexpected high.

17 Support Vector Machine Map data into the feature space Learn in the feature space Return the result to the output space Learning function f (xi) = xi w + b f(xi) > 0 for yi = +1, f(xi) < 0 for yi = -1 f(xi) = 0 for decision boundary Output space Input space Feature space

18 SVM Characteristics Maximum margin Low computer cost: Kernel function costs O(n). Training cost: the worst case costs O(nsv3 + nsv2m + nsvmn); the best case costs O(nm2). Testing cost: O(n).

19 Linear SVM - Separable Case No kernel = scalar dot product Margin = 2/ w minimizing w2 Constraints (xi w+b)yi >0

20 Linear SVM - Non-Separable Case Introduce slacks ξi to adjust the choosing of support vector when needed. This means adding a constraint C on the Lagrangean multipliers C = 100 in our experiment.

21 Non-Linear SVM There is no linear decision boundary in the input space

22 Non-Linear Support Vector Machine Introduce kernel function to map data into Euclidean high dimensional space: dot product.

23 Non-Linear Support Vector Machine Now the data and weight are in the hyperspace. Training and testing processes are in the high dimensional space

24 Problem of Microarray Data Instance space F1x F2 x x Fi x x Fn The training set must be a large enough subset of instance space. Over-fitting problem of small data set: the inducer performs well in training set, but acts poorly in test set. The computational cost of high dimensional data is so high (n = 54675). Multiple testing correction: FDR, SAM, Classical analysis methods are not suitable.

25 Feature Selection Benefits of feature selection are reducing Computer cost Over-fitting Feature selection actually is a search algorithm in the feature space to find the optimal feature subset. Given an inducer I, and a data set D with features X1,, Xi,, Xn from a distribution D over the labeled instance space, an optimal feature subset, Xopt, is a subset of the features such that the accuracy of the induced classifier C = I(D) is maximal (Kohavi97).

26 Feature Selection: How? Filter method vs wrapper method Feature ranking criteria Correlation coefficient Weight

27 Recursive Feature Elimination RFE is a top-down (backward) wrapper using weight as feature ranking criterion. Eliminate One feature in every loop: slow A subset in every loop: fast Are they the same optimal subsets? Is the feature ranking criteria are the same?

28 Feature Selection Meaning Create nested subsets Let define Rate of elimination Surviving subset Note that the feature selection module includes an inducer so the training set must be never seen in both Feature selection module Evaluation module (Kohavi97)

29 Full Two Factorial Experiment Design The evaluation cost is the accuracy. The evaluation methods: independent test and cross-validation. The inducer is SVM for both feature selection and evaluation (Guyon02). The factor A (row) is the rate of elimination. The factor B (column) is the surviving subset

30 Software Design Preprocessing data: linear normalization + log2 transformation (prep.java) SVM, feature selection and evaluation: Matlab 6.5R13

31 Result: Lymphoma The optimal subset is 32 genes.

32 Result: Lymphoma Box Plots Box Plots

33 Result: Lymphoma ANOVA Tsuc Vsuc

34 Result: Meningioma The optimal subset is 4.

35 Result: Meningioma Box Plots Small Plan: 4 Large Plan: 2 Large Small

36 Result: Meningioma ANOVA Correct choice is 4. Index Probe _at _s_at _a_at 21552_x_at

37 Conclusion No interaction between the rate of elimination and the feature optimal subset Small data set: rely on cross-validation

38 Future Works More published data set: large + small, difficult + easy How small is small? Evaluation method for small data set: master gene lists + LOOCV Over-fitting and cross-validation

39 Experience Not all the data mining task will be success. Business focus: communication, learning, negotiation, team work, leadership, Understand and live with data: a high dimensional small data set Never alternate the data in preprocessing process (time cost) Experimental design: good planning Observation + Think + Reaction = Strategy Loop, deal with the fact, not with who. Repeatable: Document experiment results and analysis Welcome new idea: good + bad; read, read, and read Never seen rule of test data, evaluation algorithm, over-fitting Feature selection SVM Software

40 References (Blum) Avrim L. Blum and Pat Langley, Selection of Relevant Features and Examples in Machine Learning, (Burges98) Christopher J.C. Burges, A Turtorial on Support Vector Machines for Pattern Recognition, (1998), Web-print: (Golub99) Golub et al, Molecular Classication of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, Science 286 (1999), 531-7, et al 1999.pdf. (Guyon02) Isabelle Guyon et al., Gene Selection for Cancer Classication using Support Vector Machines, Machine Learning 46 (2002), no. 1-3, , Web-print: (Gunn98) Steve R. Gunn, Support Vector Machines for Classification and Regression, (1998), (Kohavi97) Ron Kohavi and George H. John, Wrappers for Feature Subset Selection, Artifcial Intelligence 97 (97), (Soroin03) Soroin Dr aghici, Data Analysis Tools for DNA Microarrays, Chapman and Hall/CRC, WHO Classification

41 Question?

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology

More information

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression

More information

THE data used in this project is provided. SEIZURE forecasting systems hold promise. Seizure Prediction from Intracranial EEG Recordings

THE data used in this project is provided. SEIZURE forecasting systems hold promise. Seizure Prediction from Intracranial EEG Recordings 1 Seizure Prediction from Intracranial EEG Recordings Alex Fu, Spencer Gibbs, and Yuqi Liu 1 INTRODUCTION SEIZURE forecasting systems hold promise for improving the quality of life for patients with epilepsy.

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science

More information

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis Data mining for Obstructive Sleep Apnea Detection 18 October 2017 Konstantinos Nikolaidis Introduction: What is Obstructive Sleep Apnea? Obstructive Sleep Apnea (OSA) is a relatively common sleep disorder

More information

Classification of cancer profiles. ABDBM Ron Shamir

Classification of cancer profiles. ABDBM Ron Shamir Classification of cancer profiles 1 Background: Cancer Classification Cancer classification is central to cancer treatment; Traditional cancer classification methods: location; morphology, cytogenesis;

More information

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection 202 4th International onference on Bioinformatics and Biomedical Technology IPBEE vol.29 (202) (202) IASIT Press, Singapore Efficacy of the Extended Principal Orthogonal Decomposition on DA Microarray

More information

Final Project Report Sean Fischer CS229 Introduction

Final Project Report Sean Fischer CS229 Introduction Introduction The field of pathology is concerned with identifying and understanding the biological causes and effects of disease through the study of morphological, cellular, and molecular features in

More information

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features American Journal of Applied Sciences 8 (12): 1295-1301, 2011 ISSN 1546-9239 2011 Science Publications Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine

More information

FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION

FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION SOMAYEH ABBASI, HAMID MAHMOODIAN Department of Electrical Engineering, Najafabad branch, Islamic

More information

Utilizing Posterior Probability for Race-composite Age Estimation

Utilizing Posterior Probability for Race-composite Age Estimation Utilizing Posterior Probability for Race-composite Age Estimation Early Applications to MORPH-II Benjamin Yip NSF-REU in Statistical Data Mining and Machine Learning for Computer Vision and Pattern Recognition

More information

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining

More information

Bivariate variable selection for classification problem

Bivariate variable selection for classification problem Bivariate variable selection for classification problem Vivian W. Ng Leo Breiman Abstract In recent years, large amount of attention has been placed on variable or feature selection in various domains.

More information

Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine

Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine Akshay Joshi Anum Khan Omkar Kulkarni Department of Computer Engineering Department of Computer Engineering

More information

An Improved Algorithm To Predict Recurrence Of Breast Cancer

An Improved Algorithm To Predict Recurrence Of Breast Cancer An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant

More information

SubLasso:a feature selection and classification R package with a. fixed feature subset

SubLasso:a feature selection and classification R package with a. fixed feature subset SubLasso:a feature selection and classification R package with a fixed feature subset Youxi Luo,3,*, Qinghan Meng,2,*, Ruiquan Ge,2, Guoqin Mai, Jikui Liu, Fengfeng Zhou,#. Shenzhen Institutes of Advanced

More information

Variable Features Selection for Classification of Medical Data using SVM

Variable Features Selection for Classification of Medical Data using SVM Variable Features Selection for Classification of Medical Data using SVM Monika Lamba USICT, GGSIPU, Delhi, India ABSTRACT: The parameters selection in support vector machines (SVM), with regards to accuracy

More information

Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science

Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science Assigning B cell Maturity in Pediatric Leukemia Gabi Fragiadakis 1, Jamie Irvine 2 1 Microbiology and Immunology, 2 Computer Science Abstract One method for analyzing pediatric B cell leukemia is to categorize

More information

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods International Journal of Bioinformatics and Biomedical Engineering Vol. 1, No. 3, 2015, pp. 318-322 http://www.aiscience.org/journal/ijbbe ISSN: 2381-7399 (Print); ISSN: 2381-7402 (Online) Diagnosis of

More information

CANCER DIAGNOSIS USING DATA MINING TECHNOLOGY

CANCER DIAGNOSIS USING DATA MINING TECHNOLOGY CANCER DIAGNOSIS USING DATA MINING TECHNOLOGY Muhammad Shahbaz 1, Shoaib Faruq 2, Muhammad Shaheen 1, Syed Ather Masood 2 1 Department of Computer Science and Engineering, UET, Lahore, Pakistan Muhammad.Shahbaz@gmail.com,

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A Medical Decision Support System based on Genetic Algorithm and Least Square Support Vector Machine for Diabetes Disease Diagnosis

More information

Predicting Kidney Cancer Survival from Genomic Data

Predicting Kidney Cancer Survival from Genomic Data Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United

More information

Data analysis and binary regression for predictive discrimination. using DNA microarray data. (Breast cancer) discrimination. Expression array data

Data analysis and binary regression for predictive discrimination. using DNA microarray data. (Breast cancer) discrimination. Expression array data West Mike of Statistics & Decision Sciences Institute Duke University wwwstatdukeedu IPAM Functional Genomics Workshop November Two group problems: Binary outcomes ffl eg, ER+ versus ER ffl eg, lymph node

More information

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data Dhouha Grissa, Mélanie Pétéra, Marion Brandolini, Amedeo Napoli, Blandine Comte and Estelle Pujos-Guillot

More information

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

Predicting Breast Cancer Recurrence Using Machine Learning Techniques Predicting Breast Cancer Recurrence Using Machine Learning Techniques Umesh D R Department of Computer Science & Engineering PESCE, Mandya, Karnataka, India Dr. B Ramachandra Department of Electrical and

More information

Classification with microarray data

Classification with microarray data Classification with microarray data Aron Charles Eklund eklund@cbs.dtu.dk DNA Microarray Analysis - #27612 January 8, 2010 The rest of today Now: What is classification, and why do we do it? How to develop

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

! BIOL 2401! Week 5. Nervous System. Nervous System

! BIOL 2401! Week 5. Nervous System. Nervous System Collin County Community College! BIOL 2401! Week 5 Nervous System 1 Nervous System The process of homeostasis makes sure that the activities that occur in the body are maintained within normal physiological

More information

Data Mining in Bioinformatics Day 4: Text Mining

Data Mining in Bioinformatics Day 4: Text Mining Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Introduction to Discrimination in Microarray Data Analysis

Introduction to Discrimination in Microarray Data Analysis Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t

More information

Prediction of Malignant and Benign Tumor using Machine Learning

Prediction of Malignant and Benign Tumor using Machine Learning Prediction of Malignant and Benign Tumor using Machine Learning Ashish Shah Department of Computer Science and Engineering Manipal Institute of Technology, Manipal University, Manipal, Karnataka, India

More information

General: Brain tumors are lesions that have mass effect distorting the normal tissue and often result in increased intracranial pressure.

General: Brain tumors are lesions that have mass effect distorting the normal tissue and often result in increased intracranial pressure. 1 Lecture Objectives Know the histologic features of the most common tumors of the CNS. Know the differences in behavior of the different tumor types. Be aware of the treatment modalities in the various

More information

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review

More information

A Hybrid Approach for Mining Metabolomic Data

A Hybrid Approach for Mining Metabolomic Data A Hybrid Approach for Mining Metabolomic Data Dhouha Grissa 1,3, Blandine Comte 1, Estelle Pujos-Guillot 2, and Amedeo Napoli 3 1 INRA, UMR1019, UNH-MAPPING, F-63000 Clermont-Ferrand, France, 2 INRA, UMR1019,

More information

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi

More information

Network-based pattern recognition models for neuroimaging

Network-based pattern recognition models for neuroimaging Network-based pattern recognition models for neuroimaging Maria J. Rosa Centre for Neuroimaging Sciences, Institute of Psychiatry King s College London, UK Outline Introduction Pattern recognition Network-based

More information

On the Combination of Collaborative and Item-based Filtering

On the Combination of Collaborative and Item-based Filtering On the Combination of Collaborative and Item-based Filtering Manolis Vozalis 1 and Konstantinos G. Margaritis 1 University of Macedonia, Dept. of Applied Informatics Parallel Distributed Processing Laboratory

More information

An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM)

An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM) www.ijcsi.org 8 An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM) Shamsan Aljamali 1, Zhang Zuping 2 and Long Jun 3 1 School of Information

More information

T-Relief: Feature Selection for Temporal High- Dimensional Gene Expression Data

T-Relief: Feature Selection for Temporal High- Dimensional Gene Expression Data T-Relief: Feature Selection for Temporal High- Dimensional Gene Expression Data Milos Radovic 1,2*, Milos Jordanski 3, Nenad Filipovic 4 and Zoran Obradovic 1 1 Center for Data Analytics and Biomedical

More information

Unsupervised MRI Brain Tumor Detection Techniques with Morphological Operations

Unsupervised MRI Brain Tumor Detection Techniques with Morphological Operations Unsupervised MRI Brain Tumor Detection Techniques with Morphological Operations Ritu Verma, Sujeet Tiwari, Naazish Rahim Abstract Tumor is a deformity in human body cells which, if not detected and treated,

More information

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Data Mining Techniques to Predict Cancer Diseases

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

Five Most Common Problems in Surgical Neuropathology

Five Most Common Problems in Surgical Neuropathology Five Most Common Problems in Surgical Neuropathology If the brain were so simple that we could understand it, we would be so simple that we couldn t Emerson Pugh What is your greatest difficulty in neuropathology?

More information

Information-theoretic stimulus design for neurophysiology & psychophysics

Information-theoretic stimulus design for neurophysiology & psychophysics Information-theoretic stimulus design for neurophysiology & psychophysics Christopher DiMattina, PhD Assistant Professor of Psychology Florida Gulf Coast University 2 Optimal experimental design Part 1

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the

More information

G3.02 The malignant potential of the neoplasm should be recorded. CG3.02a

G3.02 The malignant potential of the neoplasm should be recorded. CG3.02a G3.02 The malignant potential of the neoplasm should be recorded. CG3.02a Conventional adrenocortical neoplasm. Each of the below parameters is scored 0 when absent and 1 when present. 3 or more of these

More information

EXPression ANalyzer and DisplayER

EXPression ANalyzer and DisplayER EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh

More information

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures 1 2 3 4 5 Kathleen T Quach Department of Neuroscience University of California, San Diego

More information

A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION

A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION 5-9 JATIT. All rights reserved. A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION 1 H. Mahmoodian, M. Hamiruce Marhaban, 3 R. A. Rahim, R. Rosli, 5 M. Iqbal Saripan 1 PhD student, Department

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and

More information

Identification of Neuroimaging Biomarkers

Identification of Neuroimaging Biomarkers Identification of Neuroimaging Biomarkers Dan Goodwin, Tom Bleymaier, Shipra Bhal Advisor: Dr. Amit Etkin M.D./PhD, Stanford Psychiatry Department Abstract We present a supervised learning approach to

More information

Data complexity measures for analyzing the effect of SMOTE over microarrays

Data complexity measures for analyzing the effect of SMOTE over microarrays ESANN 216 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 27-29 April 216, i6doc.com publ., ISBN 978-2878727-8. Data complexity

More information

Breast Cancer Prevention and Early Detection using Different Processing Techniques

Breast Cancer Prevention and Early Detection using Different Processing Techniques e t International Journal on Emerging Technologies (Special Issue on ICRIET-2016) 7(2): 92-96(2016) ISSN No. (Print) : 0975-8364 ISSN No. (Online) : 2249-3255 Breast Cancer Prevention and Early Detection

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

BREAST CANCER EPIDEMIOLOGY MODEL:

BREAST CANCER EPIDEMIOLOGY MODEL: BREAST CANCER EPIDEMIOLOGY MODEL: Calibrating Simulations via Optimization Michael C. Ferris, Geng Deng, Dennis G. Fryback, Vipat Kuruchittham University of Wisconsin 1 University of Wisconsin Breast Cancer

More information

Predicting Sleep Using Consumer Wearable Sensing Devices

Predicting Sleep Using Consumer Wearable Sensing Devices Predicting Sleep Using Consumer Wearable Sensing Devices Miguel A. Garcia Department of Computer Science Stanford University Palo Alto, California miguel16@stanford.edu 1 Introduction In contrast to the

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

Keywords: Leukaemia, Image Segmentation, Clustering algorithms, White Blood Cells (WBC), Microscopic images.

Keywords: Leukaemia, Image Segmentation, Clustering algorithms, White Blood Cells (WBC), Microscopic images. Volume 6, Issue 10, October 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study on

More information

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction Samuel Giftson Durai Research Scholar, Dept. of CS Bishop Heber College Trichy-17, India S. Hari Ganesh, PhD Assistant

More information

EXTRACT THE BREAST CANCER IN MAMMOGRAM IMAGES

EXTRACT THE BREAST CANCER IN MAMMOGRAM IMAGES International Journal of Civil Engineering and Technology (IJCIET) Volume 10, Issue 02, February 2019, pp. 96-105, Article ID: IJCIET_10_02_012 Available online at http://www.iaeme.com/ijciet/issues.asp?jtype=ijciet&vtype=10&itype=02

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Tumors of the Central Nervous System

Tumors of the Central Nervous System Tumors of the Central Nervous System 1 Financial Disclosures I have NO SIGNIFICANT FINANCIAL, GENERAL, OR OBLIGATION INTERESTS TO REPORT Introduction General: Brain tumors are lesions that have mass effect

More information

3. Model evaluation & selection

3. Model evaluation & selection Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 21 to March 4, 2011 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt:

More information

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Patrick J. Heagerty Department of Biostatistics University of Washington 174 Biomarkers Session Outline

More information

Brain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine

Brain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Brain Tumour Detection of MR Image Using Naïve

More information

Classifying Substance Abuse among Young Teens

Classifying Substance Abuse among Young Teens Classifying Substance Abuse among Young Teens Dylan Rhodes, Sunet: dylanr December 14, 2012 Abstract This project attempts to use machine learning to classify substance abuse among young teens. It makes

More information

Primary Level Classification of Brain Tumor using PCA and PNN

Primary Level Classification of Brain Tumor using PCA and PNN Primary Level Classification of Brain Tumor using PCA and PNN Dr. Mrs. K.V.Kulhalli Department of Information Technology, D.Y.Patil Coll. of Engg. And Tech. Kolhapur,Maharashtra,India kvkulhalli@gmail.com

More information

Biomedical Research 2016; Special Issue: S148-S152 ISSN X

Biomedical Research 2016; Special Issue: S148-S152 ISSN X Biomedical Research 2016; Special Issue: S148-S152 ISSN 0970-938X www.biomedres.info Prognostic classification tumor cells using an unsupervised model. R Sathya Bama Krishna 1*, M Aramudhan 2 1 Department

More information

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/ CSCI1950 Z Computa4onal Methods for Biology Lecture 18 Ben Raphael April 8, 2009 hip://cs.brown.edu/courses/csci1950 z/ Binary classifica,on Given a set of examples (x i, y i ), where y i = + 1, from unknown

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK CHARACTER REORGANISATION BASED ON SEGMENTATION METHOD FOR PREDICTION OF HUMAN PERSONALITY

More information

Learning Classifier Systems (LCS/XCSF)

Learning Classifier Systems (LCS/XCSF) Context-Dependent Predictions and Cognitive Arm Control with XCSF Learning Classifier Systems (LCS/XCSF) Laurentius Florentin Gruber Seminar aus Künstlicher Intelligenz WS 2015/16 Professor Johannes Fürnkranz

More information

Collin County Community College BIOL Week 5. Nervous System. Nervous System

Collin County Community College BIOL Week 5. Nervous System. Nervous System Collin County Community College BIOL 2401 Week 5 Nervous System 1 Nervous System The process of homeostasis makes sure that the activities that occur in the body are maintained within normal physiological

More information

Large-Scale Statistical Modelling via Machine Learning Classifiers

Large-Scale Statistical Modelling via Machine Learning Classifiers J. Stat. Appl. Pro. 2, No. 3, 203-222 (2013) 203 Journal of Statistics Applications & Probability An International Journal http://dx.doi.org/10.12785/jsap/020303 Large-Scale Statistical Modelling via Machine

More information

Finding the Augmented Neural Pathways to Math Processing: fmri Pattern Classification of Turner Syndrome Brains

Finding the Augmented Neural Pathways to Math Processing: fmri Pattern Classification of Turner Syndrome Brains Finding the Augmented Neural Pathways to Math Processing: fmri Pattern Classification of Turner Syndrome Brains Gary Tang {garytang} @ {stanford} {edu} Abstract The use of statistical and machine learning

More information

BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE

BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE NABEEL AL-MILLI Financial and Business Administration and Computer Science Department Zarqa University College Al-Balqa' Applied University

More information

Gene Expression Based Leukemia Sub Classification Using Committee Neural Networks

Gene Expression Based Leukemia Sub Classification Using Committee Neural Networks Bioinformatics and Biology Insights M e t h o d o l o g y Open Access Full open access to this and thousands of other papers at http://www.la-press.com. Gene Expression Based Leukemia Sub Classification

More information

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1 Lecture 27: Systems Biology and Bayesian Networks Systems Biology and Regulatory Networks o Definitions o Network motifs o Examples

More information

Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation

Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation L Uma Maheshwari Department of ECE, Stanley College of Engineering and Technology for Women, Hyderabad - 500001, India. Udayini

More information

7.1 Grading Diabetic Retinopathy

7.1 Grading Diabetic Retinopathy Chapter 7 DIABETIC RETINOPATHYGRADING -------------------------------------------------------------------------------------------------------------------------------------- A consistent approach to the

More information

k-nn Based Classification of Brain MRI Images using DWT and PCA to Detect Different Types of Brain Tumour

k-nn Based Classification of Brain MRI Images using DWT and PCA to Detect Different Types of Brain Tumour International Journal of Medical Research & Health Sciences Available online at www.ijmrhs.com ISSN No: 2319-5886 International Journal of Medical Research & Health Sciences, 2017, 6(9): 15-20 I J M R

More information

Chapter 4 DESIGN OF EXPERIMENTS

Chapter 4 DESIGN OF EXPERIMENTS Chapter 4 DESIGN OF EXPERIMENTS 4.1 STRATEGY OF EXPERIMENTATION Experimentation is an integral part of any human investigation, be it engineering, agriculture, medicine or industry. An experiment can be

More information

NMF-Density: NMF-Based Breast Density Classifier

NMF-Density: NMF-Based Breast Density Classifier NMF-Density: NMF-Based Breast Density Classifier Lahouari Ghouti and Abdullah H. Owaidh King Fahd University of Petroleum and Minerals - Department of Information and Computer Science. KFUPM Box 1128.

More information

A study on Feature Selection Methods in Medical Decision Support Systems

A study on Feature Selection Methods in Medical Decision Support Systems A study on Feature Selection Methods in Medical Decision Support Systems Rahul Samant, SVKM S NMIMS, Shirpur Campus, India; Srikantha Rao, TIMSCDR, Mumbai University, Kandivali, Mumbai, India, Abstract

More information

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System T.Manikandan 1, Dr. N. Bharathi 2 1 Associate Professor, Rajalakshmi Engineering College, Chennai-602 105 2 Professor, Velammal Engineering

More information

A novel approach to feature extraction from classification models based on information gene pairs

A novel approach to feature extraction from classification models based on information gene pairs Pattern Recognition 41 (2008) 1975 1984 www.elsevier.com/locate/pr A novel approach to feature extraction from classification models based on information gene pairs J. Li, X. Tang, J. Liu, J. Huang, Y.

More information

Brain Tumor segmentation and classification using Fcm and support vector machine

Brain Tumor segmentation and classification using Fcm and support vector machine Brain Tumor segmentation and classification using Fcm and support vector machine Gaurav Gupta 1, Vinay singh 2 1 PG student,m.tech Electronics and Communication,Department of Electronics, Galgotia College

More information

Predicting Breast Cancer Survivability Rates

Predicting Breast Cancer Survivability Rates Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer

More information

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Literature Survey Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is

More information

number Done by Corrected by Doctor Maha Shomaf

number Done by Corrected by Doctor Maha Shomaf number 16 Done by Waseem Abo-Obeida Corrected by Zeina Assaf Doctor Maha Shomaf MALIGNANT NEOPLASMS The four fundamental features by which benign and malignant tumors can be distinguished are: 1- differentiation

More information