International Journal of Pure and Applied Mathematics

Size: px
Start display at page:

Download "International Journal of Pure and Applied Mathematics"

Transcription

1 Volume 119 No , ISSN: (on-line version) url: ijpam.eu Analysis of Cancer Classification of Gene Expression Data A Scientometric Review 1 Joseph M. De Guia, 2 Madhavi Devaraj, PhD 1, 2 School of IT, Mapua University, Muralla St., Intramuros, Manila Philippines 1 jmdeguia@mapua.edu.ph, 2 mdevaraj@mapua.edu.ph Abstract The discovery of diseases at molecular level is the main interest of this study and a great challenge for researchers in the field of bioinformatics and cancer classification. Understanding the genes that contributes to the cancer malady is a great challenge to many researchers. This paper analyzed the published papers on cancer classification using the scientometric approach. Scopus was used to search the published papers on cancer classification. The cancer classification was used as the target keyword for the query of the online database search. The result of the search analysis is a Scopus dataset and CRExplorer visualization analysis to identify the important attributes of the cancer classification researches and its impact and global contribution. The cancer classifier models were also presented and evaluation were discussed. A proposed conceptual and system model were presented from the discussion and results of the papers extracted from the scientometric analysis. Keywords: scientometric, cancer classification, machine learning, microarray, gene expression, genomics. 1.Introduction Cancer is considered one of the deadliest genetic maladies of the human genome and has been the research interest in the field of medicine for the past decades. The World Health Organization (WHO) reported cancer (14 million new cases in 2012) is a major cause or morbidity and mortality that accounts as the second leading cause of death worldwide resulting 8.8 million deaths in 2015 [15]. The World Cancer Report described cancer as a global problem because it affects the whole greater population. It was projected by the same report that cancer incidence will increase to 20 million new cases by 2025 [16]. There are several known published literatures on cancer classification that used new approaches in molecular level. The important goal in cancer research is to identify the specific genes that contribute to cancer disease. New approaches in technology reveal the molecular level of cancer classification wherein thousands of genes at a time can be subjected for analysis in a single chip called microarray. Microarrays are microscopic slides that contain ordered series of samples of DNA (Deoxyribonucleic acids), RNA (Ribonucleic acids), protein, or tissue and others [6]. A single chip microarray can measure the gene expression of 30,000 gene sample that represents most of the human genome [4]. The surveyed literature identified in this paper used gene expression data in classifying cancer genes using statistical and machine learning techniques. The discussions and experiments described higher data computational analysis using pattern recognition, supervised and unsupervised learning are the most common algorithms implemented. There are several approaches that dealt with algorithms of cancer classification and were successful in implementing these models to classify wide variety of cancer types and others provide biological interpretation based on the prediction outcomes. To have an in-depth analysis of the literature of cancer classification using gene expression data, scientometric method was used. Scientomertics is branch of Science that clarifies about the input and output of a certain structure which is organized in order [9]. The discussion related to scientometrics, bibliometric and informatics that deals with the study and flow of the production of the literatures. The scientometric analysis provide synchronous and diachronist analyses. The result of the analysis was based on the statistical evolution of the published papers from different sources and journals using several parameters to get insights on its progression. The challenge of cancer classification using the microarray is the application of model based selection and prediction algorithm that will classify the cancer genes at molecular level using gene Page 1 of

2 expression data. The computation time, classification accuracy, and its biological relevance in the cancer classification was still in question. This paper will present answers through survey and analysis of papers published for cancer classification, gene expression data, and machine learning. The main goal of this study is to explore and analyze the published papers on cancer classification using gene expression data. The method used was descriptive following the Scientometric process. The Elsevier Scopus published literatures were collected based on the search keywords cancer classification gene expression data. The scope of this paper is to present the cancer classification using gene expression data. The searched query on cancer classification, gene expression data, algorithms, used in this paper were based on selected cancer classification models. The coverage of the publication used in the search and analysis were from Related Works There are volumes of published research and articles about cancer and associating the word cancer genome when searched on the Internet. Using Google, it returned 1.2 million searched items (as of March 8, 2018) using the keyword cancer genome. In Google scholar, it s 3.2 million where most of the cited items are cancer genome, proteomics, microarray, machine learning algorithms and others. 2.1 Cancer genome studies Cancer in the medical term is abnormal state of a cell or a group of cells that mutates and destroys other tissues in the human body. Cancer is not just one disease but of many diseases. There are more than 100 different types of cancer. The main categories of cancer include: carcinoma, cancer of the skin and lining of the internal organs; sarcoma, cancer of the bone muscle, blood vessels and connecting tissue; leukemia, cancer of the blood; lymphoma and myeloma, cancer of the immune system; and the central nervous system cancers that begins in the brain and spinal cord; and other tumors [17]. The genome-wide association studies (GWAS) contributed to the rapid discovery of genetic disease [18]. The cancer research accelerated the reporting of GWAS resulted in the investigation of genetic analysis. In 2007 GWAS publication, there are about 40 unmistakable hereditary loci have been convincingly distinguished for in excess of two dozen distinct cancers [18,19]. The International Cancer Genome Consortium has a catalogue of cancer mutations of more than 25,000 tumors of 25 cancer types [20]. 2.2 Microarray, Gene Expression Data, and Knowledge Discovery Microarrays are microscopic slides that contain an ordered series of samples such as DNA, RNA, protein and tissues [6]. The sample placed into the slide such as DNA microarray; RNA microarray and others will be the type of microarray. The most commonly used microarray is the DNA microarray. The DNA is spotted on the slides and chemically synthesized long oligonucleotides or enzymes. DNA is held in place by chemically reactive aldehydes or primary amines or either synthesized by photolithographic process. The cancer gene expression is made possible from the Internet cancer genomic data [21]. Most of the data available are breast and lung cancer data sets and others have less than 100 sample sizes. Microarray profiling innovation, which has been most generally used to study gene expression in cancer. This is a critical advancement in genomic marker research used in clinical practice. The microarray processing of class comparison, discovery, and prediction of the cancer gene expression, both supervised and unsupervised methods of analysis were used. The knowledge discovery using statistical and machine learning models predict and classify the gene expression data. Other implementations of cancer classification are clustering and visualization annotation for biological interpretation. 3. Proposed work This paper mainly explored and analyzed published papers in cancer classification using gene expression data. The survey method used was Scientometric. The published papers in cancer classification can provide insights in relation to its impact to the research domain using the citations. The h-index is a representation of an achievement of the author whose work was being cited by other researchers. The h-index is an algorithm that measures the impact of the researcher or author where the paper was referred. Page 2 of

3 To illustrate the conceptual framework, the following activities were fully discussed in the methodology. The data gathering process will take place in Scopus database. The output of the search results will be based on the topic of interest. The second process is the data pre-processing where the dataset will be extracted, organized and cleaned. The third process is the data interpretation of the processed data by using the Scopus analyzer search results to compare the pre-processed data. The last process is the post-processing where the result of the data sets was examined using analysis and visualization tools to reveal the insights related to the citations, content relationships and to determine the cancer classifier outcome. A conceptual and system model using supervised learning and gene selection method based on the evaluation method was proposed. 4. Results and Discussion 4.1 Scopus search results The collected related work in Scopus using the initial search query returned 31,028 documents published from The first Scopus dataset was examined to determine the frequency of the keywords used in the search query TITLE-ABS-KEY. Using the TITLE-ABSTRACT-AUTHOR KEYWORDS-INDEX KEYWORDS. Then next task is to analyze the sources. The comparison of sources was analyzed using the compare source publication. Scopus compare sources calculations and compare at least top ten sources according to the set parameters. These includes CiteScore, SJR, SNIP, Cites, Documents, Percentage cited, and Percentage review. CiteScore is the average number of citations for a year published in certain journal in the preceding three years. Refer to figure 2 for the Scopus result for search analysis documents per year source. This shows that the topic of interest has still the traction each year. Refer to figure 3 for the CiteScore publication by year. Bioinformatics is the leading source of research paper and the highest paper related to the topic. Figure 2. Documents per year by source (Scopus, 2018) Figure 3. CiteScore publication by year (Scopus,2018) In table 3 the top 10 ten sources and citations on Scopus search analysis were Bioinformatics, BMC Bioinformatics, PLoS Computational Biology, Neurocomputing, Journal of Biomedical Informatics, Journal of Computational Biology, Annual International Conference of the IEEE Engineering in Medicine and Biology Proceedings, Artificial Intelligence in Medicine, Computers in Biology and Medicine, Artificial Intelligence in Medicine. The top source title Bioinformatics has 117 documents, 6.73 Citescore, 96,864 citations. Figure 4. Document by affiliation (Scopus, 2018) Figure 5. Documents by country/territory (Scopus, 2018) Other search analysis available are document affiliation, country and territory, document type, and subject area. In figure 4, analysis for the documents by affiliation, Chinese Academy of Science published 400 papers on year 2000 to Whereas the United States (figure 5) produced the highest Page 3 of

4 number of papers (7,894) related on the topic compared with China (4,225) on the same period. The document type with the highest output was article 51.8% (15,630) followed by conference papers 42.3% (12,769). 4.2 CRExplorer The Cited References Explorer (CRExplorer) [8] is another tool that can analyze the historical roots of fields, topics, or researches by Reference Publication Year Spectroscopy (RPYS). The method used for the analysis based on RPYS is the frequency of references cited in the researches in terms of the publication years. The reference data set used on this analysis was extracted from Scopus. Refer to figure 6 for the RPYS analysis of the topic. There were 53,658 cited references in the 51 years period and 18 different citing publications years with the total of 1447 documents see figure 8. The researchers on cancer classification using gene expression started it traction on 1995 with its influential work on 1999 of Golub, et.al followed by Guyon, Alon, Nguyen, Dudoit, Statnikov, et. al. The peak of the research on 2005 with the use of microarray more datasets and the evaluation using classification methods. The work of Statnikov et.al marked its constant cited reference of the citing years thereafter. These papers used comprehensive evaluations and classification methods for microarray gene expression cancer diagnosis. The evaluations and classification method were discussed in section 5. Figure 7. Citation of cancer classification gene Figure 8. Cited References dataset cancer classification expression CRExplorer (CRExplorer, 2018) gene expression data (CRExplorer, 2018) 5. Survey of the cancer classification results The following discussion summarizes the discussion and results from the papers sourced from the scientometric analysis. 5.1 Cancer classification methods evaluation The weighted voting gene selection works well for classifying binary data [10]. This method is a correlation based classifier where a class label assigned on weighted voting of the informative genes. This method works well with some data such as leukemia. The disadvantage of this method was it is not effective in more than 2 classes of data set. The similarity based classifiers, KNN and CAST are not affected by the noise and bias in data. KNN is tolerable to noise because it makes use of several training data to determine the test class label. CAST is a cluster based on separable groups containing normal and tumor samples. KNN use less computing time than CAST because of the similarity score evaluation performed on every test and training. These methods are not scalable and not practical for cancer classification because of it use too much computation time. The max-margin classifiers SVM and Boosting are ideal with the high dimensional data, noise and sparseness as well as overfitting avoidance. SVM has the advantage of selecting small number of support vectors of the learning algorithm against the large training set. However, SVM is limited only for binary class problems. Boosting improves the classification accuracy through number of folds of class training. But the repeated classification of weighted training consumes much time effort. Bayesian network (BN) and Neural nets (NN) can be applied to multiclass classifier. The disadvantage of the process is a black box and not capable to reveal any biological information in the data. Decision trees (DT) can be interpreted its meaning and does not require parameter. Trees can be generated right away as the data size increases. DT algorithms are good classifier in terms of scalability. Page 4 of

5 In terms of the classifier accuracy, experiments made by [11] SVM has the highest accuracy (leukemia and ovarian data set). CAST method is better (colon data set). While Boosting is more better to outperform NN for leukemia, ovarian, colon) data set. Similarly, NB outperform GS approach for leukemia and ovarian data sets. The opposite for colon where GS did great compare to NB [12]. Table 4 presented the summary of the cancer classifiers survey result [3]. This shows that there is no cancer classifier that is superior to all on the models. This can be a research topic to explore on cancer classifier s accuracy and biological meaning that points to a new classification algorithm or modify the existing algorithm to fit the bio-relevant answer to cancer classifiers. The limited number of cancer database and data sets varies from each type of cancer genes from each source. Table 4. Summary of the cancer classification survey Classification method Multiclass Strategy Evaluation Biological meaning Scalability SVM No Max-Margin No Good Boosting Yes Max-Margin Yes Class dependent DT Yes Entropy function Yes Good KNN Yes Similarity No Not scalable CAST Yes Similarity No Not scalable GS No Weighted voting Yes Fair FLDA Yes Discriminant Analysis No Fair NN Yes Perceptron NB Yes Distribution modeling No Fair 5.2 Gene selection The feature selection is an important process in the cancer classification to help eliminate problems in data set noise and over fitting of the classifier. In addition, this will reveal the biorelevant information to make use of DT to see the actual view of gene movement and value. The gene selection reduces the large attribute space that helps the classifier to improve the accuracy [2] [10] [13] [14]. The gene feature ranking approach measures the correlation of class labels and attribute values. Using the GS selection method [10] with the correlation is simple to implement but has a disadvantage for mistaken selection of cancer gene values for normal and tumor types. Using the linear discriminant function weights for training classifiers, the weights of the features are directly proportional to the important class labels identified. Comparing the genes selected using NB and GS election method [15]. NB classifier accuracy is better and has more genes variety than GS. Another approach is the gene subset ranking (GSR). In this method, the group of genes are clustered together to obtain a best classifier. Lastly the recursive feature elimination (REF) makes the elimination process retain the best classification power. This is also used in SVM classification as a cost function on the subset ranking. The REF and GSR works great in cancer classification compared to IGR. 6. Proposed model The proposed conceptual model for the cancer classifier using the gene feature selection and the system model for the design and process flow discussion were presented in detail in this section. A general model for the supervised classification prediction is shown in figure 8. It starts by loading the gene expression dataset which is known and normalized from a microarray gene expression platform. The dataset was divided into train and test subsets if enough samples are available. If not enough samples are available, cross-validation is used to held the sample and the predictor is trained on the remaining samples and classified by the predictor and then repeated iteratively. Once proper training set has been defined, a selection method is used. This step facilitates the training of the most classification algorithms, but some can deal with thousands of variables effectively. Subsequent validation in feature selection of gene is informative in the classification, and No Fair Page 5 of

6 classifiers are built based on these datasets. If there is a need to fine tune the vectors or parameters when training the predictor, several models are built against this marked gene and the final model chosen, the one that minimizes the total error in the cross-validation. Figure 8. Logical design for supervised learning Figure 9. Conceptual Model - System Flow The graphical representation of the classifier shows the structure of the system design and processes. This was used to design the system and understand how the system will work and develop the pseudo code in order to properly program each of the module. Figure 9 shows the conceptual system model. The supervised learning for the gene expression data starts from the microarray results wherein gene expression data came from. The microarray host high dimensional data, that is the gene expression data, it is important to select and extract the right genes to be used in the classification system. The process involves selecting the discriminative genes related to classification form the gene expression data, then trains the classifier and classifies the new data using learned classifier. The feature or gene selection chooses an optimal subset of genes for further processing. This process can be further subdivided into algorithms which evaluate and compare the predictive power of the individual gene, and algorithms which compare groups of genes. Gene ranking assumes independence among the genes expressions and differ the gene ranking parameter. The second algorithm uses forward election and backward elimination. This approach identifies complex relationships among genes at a cost significantly higher computational complexity. The feature or gene extraction uses an algorithm that builds linear mapping transformation for reducing the dimensionality of the input gene space. The diagnostic test used in the transformation of the gene expression measurements achieved accurate subset of mapped space. From the given list of gene, the classifier or predictor will make a decision to categorize the gene pattern on the prediction stage. If there is a need to further reduce the data to the required sample space, it will return to the dimensionality reduction. Evaluation of the predictor will cross-validate a classifier on the training dataset and testing the classifier to overcome over-fitting. The proposed system use normalized and ready to load input data for the supervised classification system. Normalized data set means no preprocessing and cleaning of the data needed. The data files contain normalized expression values of genes from the microarray. Figure 10. Proposed System Model The system model is made up of the dimensional reduction module reducing the noise of Figure 11. The learning machine that illustrates the reduction of input vectors and classify the vectors in preparation for the prediction Page 6 of

7 high-dimensional space of vectors as input data and the transformed vectors. Figure 9 shows Dimensionality reduction as an independent process of the design of the classifier that leads to overfitting. Independent feature subset selection is performed using the training data set. If the resulting features are then used to cross-validate a classifier, the same data would have been used for training and testing. This is over-fitting and may produce heavily biased error estimates. The learning machine builds dimensionality reduction and classifier simultaneously estimating the error rate on both stages. The cross-validation learning data set will first be used to compute the dimensionality reduction parameters, and then the computed transformation of the vectors will be applied to the learning and validation datasets. This follows the building of the classifier using the learned dataset and testing it using the mapped dataset. Thus, the validation subset is never used for either dimensionality reduction or classifier learning, but exclusively for testing. This is a statistically rigorous approach to machine learning which avoids over-fitting. Gene extraction refers to algorithms which build a linear mapping transformation for reducing the dimensionality of the input gene space. Note that a diagnostic test incorporating such a transformation utilizes all of the input gene expression measurements. 7. Conclusion This paper presented the comprehensive scientometric analysis of cancer classification using gene expression data. The analysis presented were based on the search analysis feature of Scopus and CRExplorer. The Scopus dataset provided the rich collection of documents related to cancer classification. The results pointed that most of the papers related to cancer classification are published as articles and conference papers contributed to Bioinformatics and BMC Bioinformatics title references. There was a revelation in the document analysis that the Chinese Academy of Sciences were mostly affiliated with the published papers. However, in terms of global contribution the United Sates was far by 50% to China on this account. The topic about cancer classification was still a topic on interest by most researchers on this filed. However, there was a decline according to CRExplorer s analysis after its peak on The survey analysis of the cancer classification model evaluation presented the advantage and disadvantages of each. Gene selection is an important phase in the preprocessing and cancer classification. The next step is to investigate the topic and related topic to cancer classification techniques specific to cancer genome or type of cancer disease. A review of the result of the experiments made to provide an insight regarding the machine learning techniques and other factors such as evaluation methods pertaining to efficiency of computation time, accuracy, and biological relevance. A comprehensive survey will also be helpful to make a conclusive review of the cancer classification. Finally, the proposed system model described is a concept for cancer classification. This system model explored the issues presented in the cancer classifier model and proposed to implement the supervised classification model. It is recommended to verify the system model and test the accuracy of the proposed logical and system flow designs. References [1] Brown, M. Grundy, W. Lin, D., Cristianni, N. (1999). Support vector machine classification of microarray gene expression data. Technical report. University of California. Sta. Cruz. [2] Dudoit, S. Fridlyand J., & Speed, T. (2000). Comparison and discrimination methods for the classification of tumors using gene expression data. Technical report no.56. Berkeley. Department of Statistics., Univ. California, 43. [3] Han, J., Lu, Y. (2003). Cancer classification using gene expression data. Information Systems, Vol. 28 (4), [4] Ramaswamy S., Tamayo, P. & Rifkin, R. (2001) Multiclass cancer diagnosis using tumor gene expression signatures. PNAS Vol. 98(26), [5] Shapiro, GP., Tamayo, P. (2005). Microarray data mining: Facing challenges, Vol 5 (2), 1-3. [6] Wong, G (2005). Introduction. In Minna Laine. DNA Microarray data analysis (15-24). Helsinki: CSC- Scientific computing Inc. Page 7 of

8 [7] Scopus (2018). Scopus. Elsevier. Accessed on Mar 8, 2018 from [8] CRExplorer (2018) Accessed on Mar 20, 2018 from [9] Bala A, Gupta B M. Mapping of Indian neuroscience research: A scientometric analysis of research output during Neurol India 2010; 58:35-41 [10] Golub, R., Slonim, D., Tamayo, P. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, pages [11] Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini (2000). Tissue class cation with gene expression profiles. In Proc. of the Fourth Annual Int. Conf. on Computational Molecular Biology. [12] Keller A., Schummer, M., Hood, L., and Ruzzo. W. (2000). Bayesian classification of DNA array expression data. Technical report, University of Washington [13] Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2000). Gene selection for cancer classification using support vector machines. Machine Learning. [14] Campbell, C., Li, Y., and Tipping, M (2001). An efficient feature selection algorithm for classification of gene expression data., Machine Learning. [15] World Health Organization (WHO) (2018). Cancer Fact Sheet, Feb.2018 Media Center. Accessed on March 8, 2018 from [16] Stewart, B. and Wild, C. (2014). World Cancer Report International Agency for Research on Cancer (IARC), World Health Organization (WHO). WHO Press. [17] National Cancer Institute (2018). What is cancer. Accessed on March 20, 2018 from [18] Chung CC, Magalhaes WC, GonzalezBosquet J, Chanock SJ (2010). Genomewide association studies in cancer current and future directions. Carcinogenesis, 31: bgp273 PM [19] Hindorff LA, Gillanders EM, Manolio TA (2011). Genetic architecture of cancer and other complex diseases: lessons learned and future directions. Carcinogenesis, 32: carcin/bgr056 PMID: [20] Hudson TJ, Anderson W, Artez A et al.; International Cancer Genome Consortium (2010). International network of cancer genome projects. Nature, 464: PMID: [21] Chin L, Hahn WC, Getz G, Meyerson M (2011). Making sense of cancer genomic data. Genes Dev, 25: org/ /gad PMID: Authors Biography Joseph M. De Guia is a PhD student from Mapua University, Manila Philippines. He is currently a faculty member of the School of Information Technology of same university. Joseph finished his master s degree in Information Technology (MIST) at Carnegie Mellon University and Masters in Computer Science (MSCS) at Mapua University. His research interests are data mining, artificial intelligence, information security, big data analytics, IT infrastructure and digital innovation. His research papers in digital health and enterprise architecture has been presented in international and local conferences. Dr. Madhavi Devaraj received doctoral degree in computer science from Dr. A.P.J Abdul Kalam Technical University, Lucknow, India. She has also completed Masters in Computer Applications and MPhil in Computer Science from Madurai Kamaraj University, Madurai, India. Currently, she is distinguished professor in computer science department at Mapua University, Manila, Philippines. She has been assistant professor in computer science department at Invertis University, India and Babu Banarasi Das University, India, previously. Her research interests include Text analytics, Scientometric Analysis, Opinion Mining, Sentiment Analysis, Information Extraction, Neural Networks, Artificial Intelligence, Machine Learning and Big Data Analysis. Page 8 of

9 12513

10 12514

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science

More information

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection 202 4th International onference on Bioinformatics and Biomedical Technology IPBEE vol.29 (202) (202) IASIT Press, Singapore Efficacy of the Extended Principal Orthogonal Decomposition on DA Microarray

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

An Improved Algorithm To Predict Recurrence Of Breast Cancer

An Improved Algorithm To Predict Recurrence Of Breast Cancer An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant

More information

A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system

A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system Technology and Health Care 24 (2016) S601 S605 DOI 10.3233/THC-161187 IOS Press S601 A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system Jongwoo Lim, Bohyun

More information

Diagnosis Of Ovarian Cancer Using Artificial Neural Network

Diagnosis Of Ovarian Cancer Using Artificial Neural Network Diagnosis Of Ovarian Cancer Using Artificial Neural Network B.Rosiline Jeetha #1, M.Malathi *2 1 Research Supervisor, 2 Research Scholar, Assistant Professor RVS College of Arts And Science Department

More information

Predicting Breast Cancer Survivability Rates

Predicting Breast Cancer Survivability Rates Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer

More information

Predictive Biomarkers

Predictive Biomarkers Uğur Sezerman Evolutionary Selection of Near Optimal Number of Features for Classification of Gene Expression Data Using Genetic Algorithms Predictive Biomarkers Biomarker: A gene, protein, or other change

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm Predicting Heart Attack using Fuzzy C Means Clustering Algorithm Dr. G. Rasitha Banu MCA., M.Phil., Ph.D., Assistant Professor,Dept of HIM&HIT,Jazan University, Jazan, Saudi Arabia. J.H.BOUSAL JAMALA MCA.,M.Phil.,

More information

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information

Deep Learning Analytics for Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations

Deep Learning Analytics for Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations Deep Learning Analytics for Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations Andy Nguyen, M.D., M.S. Medical Director, Hematopathology, Hematology and Coagulation Laboratory,

More information

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures 1 2 3 4 5 Kathleen T Quach Department of Neuroscience University of California, San Diego

More information

REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE

REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE 1 Biomarker discovery has opened new realms in the medical industry, from patient diagnosis and

More information

ECG Beat Recognition using Principal Components Analysis and Artificial Neural Network

ECG Beat Recognition using Principal Components Analysis and Artificial Neural Network International Journal of Electronics Engineering, 3 (1), 2011, pp. 55 58 ECG Beat Recognition using Principal Components Analysis and Artificial Neural Network Amitabh Sharma 1, and Tanushree Sharma 2

More information

Hybridized KNN and SVM for gene expression data classification

Hybridized KNN and SVM for gene expression data classification Mei, et al, Hybridized KNN and SVM for gene expression data classification Hybridized KNN and SVM for gene expression data classification Zhen Mei, Qi Shen *, Baoxian Ye Chemistry Department, Zhengzhou

More information

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features American Journal of Applied Sciences 8 (12): 1295-1301, 2011 ISSN 1546-9239 2011 Science Publications Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine

More information

A REVIEW ON CLASSIFICATION OF BREAST CANCER DETECTION USING COMBINATION OF THE FEATURE EXTRACTION MODELS. Aeronautical Engineering. Hyderabad. India.

A REVIEW ON CLASSIFICATION OF BREAST CANCER DETECTION USING COMBINATION OF THE FEATURE EXTRACTION MODELS. Aeronautical Engineering. Hyderabad. India. Volume 116 No. 21 2017, 203-208 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu A REVIEW ON CLASSIFICATION OF BREAST CANCER DETECTION USING COMBINATION OF

More information

Augmented Medical Decisions

Augmented Medical Decisions Machine Learning Applied to Biomedical Challenges 2016 Rulex, Inc. Intelligible Rules for Reliable Diagnostics Rulex is a predictive analytics platform able to manage and to analyze big amounts of heterogeneous

More information

An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM)

An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM) www.ijcsi.org 8 An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM) Shamsan Aljamali 1, Zhang Zuping 2 and Long Jun 3 1 School of Information

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION

A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION 5-9 JATIT. All rights reserved. A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION 1 H. Mahmoodian, M. Hamiruce Marhaban, 3 R. A. Rahim, R. Rosli, 5 M. Iqbal Saripan 1 PhD student, Department

More information

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

Predicting Breast Cancer Recurrence Using Machine Learning Techniques Predicting Breast Cancer Recurrence Using Machine Learning Techniques Umesh D R Department of Computer Science & Engineering PESCE, Mandya, Karnataka, India Dr. B Ramachandra Department of Electrical and

More information

Classification of cancer profiles. ABDBM Ron Shamir

Classification of cancer profiles. ABDBM Ron Shamir Classification of cancer profiles 1 Background: Cancer Classification Cancer classification is central to cancer treatment; Traditional cancer classification methods: location; morphology, cytogenesis;

More information

NMF-Density: NMF-Based Breast Density Classifier

NMF-Density: NMF-Based Breast Density Classifier NMF-Density: NMF-Based Breast Density Classifier Lahouari Ghouti and Abdullah H. Owaidh King Fahd University of Petroleum and Minerals - Department of Information and Computer Science. KFUPM Box 1128.

More information

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Data Mining Techniques to Predict Cancer Diseases

More information

Algorithms Implemented for Cancer Gene Searching and Classifications

Algorithms Implemented for Cancer Gene Searching and Classifications Algorithms Implemented for Cancer Gene Searching and Classifications Murad M. Al-Rajab and Joan Lu School of Computing and Engineering, University of Huddersfield Huddersfield, UK {U1174101,j.lu}@hud.ac.uk

More information

Classification of Smoking Status: The Case of Turkey

Classification of Smoking Status: The Case of Turkey Classification of Smoking Status: The Case of Turkey Zeynep D. U. Durmuşoğlu Department of Industrial Engineering Gaziantep University Gaziantep, Turkey unutmaz@gantep.edu.tr Pınar Kocabey Çiftçi Department

More information

Cancer Gene Extraction Based on Stepwise Regression

Cancer Gene Extraction Based on Stepwise Regression Mathematical Computation Volume 5, 2016, PP.6-10 Cancer Gene Extraction Based on Stepwise Regression Jie Ni 1, Fan Wu 1, Meixiang Jin 1, Yixing Bai 1, Yunfei Guo 1 1. Mathematics Department, Yanbian University,

More information

DEVELOPMENT OF AN EXPERT SYSTEM ALGORITHM FOR DIAGNOSING CARDIOVASCULAR DISEASE USING ROUGH SET THEORY IMPLEMENTED IN MATLAB

DEVELOPMENT OF AN EXPERT SYSTEM ALGORITHM FOR DIAGNOSING CARDIOVASCULAR DISEASE USING ROUGH SET THEORY IMPLEMENTED IN MATLAB DEVELOPMENT OF AN EXPERT SYSTEM ALGORITHM FOR DIAGNOSING CARDIOVASCULAR DISEASE USING ROUGH SET THEORY IMPLEMENTED IN MATLAB Aaron Don M. Africa Department of Electronics and Communications Engineering,

More information

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review

More information

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset.

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset. Volume 7, Issue 3, March 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Medoid Based Approach

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:

Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23: Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:7332-7341 Presented by Deming Mi 7/25/2006 Major reasons for few prognostic factors to

More information

Discovering Meaningful Cut-points to Predict High HbA1c Variation

Discovering Meaningful Cut-points to Predict High HbA1c Variation Proceedings of the 7th INFORMS Workshop on Data Mining and Health Informatics (DM-HI 202) H. Yang, D. Zeng, O. E. Kundakcioglu, eds. Discovering Meaningful Cut-points to Predict High HbAc Variation Si-Chi

More information

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods International Journal of Bioinformatics and Biomedical Engineering Vol. 1, No. 3, 2015, pp. 318-322 http://www.aiscience.org/journal/ijbbe ISSN: 2381-7399 (Print); ISSN: 2381-7402 (Online) Diagnosis of

More information

Introduction to Discrimination in Microarray Data Analysis

Introduction to Discrimination in Microarray Data Analysis Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t

More information

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 * Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 * Department of CSE, Kurukshetra University, India 1 upasana_jdkps@yahoo.com Abstract : The aim of this

More information

A Deep Learning Approach to Identify Diabetes

A Deep Learning Approach to Identify Diabetes , pp.44-49 http://dx.doi.org/10.14257/astl.2017.145.09 A Deep Learning Approach to Identify Diabetes Sushant Ramesh, Ronnie D. Caytiles* and N.Ch.S.N Iyengar** School of Computer Science and Engineering

More information

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

SNPrints: Defining SNP signatures for prediction of onset in complex diseases SNPrints: Defining SNP signatures for prediction of onset in complex diseases Linda Liu, Biomedical Informatics, Stanford University Daniel Newburger, Biomedical Informatics, Stanford University Grace

More information

Data complexity measures for analyzing the effect of SMOTE over microarrays

Data complexity measures for analyzing the effect of SMOTE over microarrays ESANN 216 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 27-29 April 216, i6doc.com publ., ISBN 978-2878727-8. Data complexity

More information

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression

More information

Nearest Shrunken Centroid as Feature Selection of Microarray Data

Nearest Shrunken Centroid as Feature Selection of Microarray Data Nearest Shrunken Centroid as Feature Selection of Microarray Data Myungsook Klassen Computer Science Department, California Lutheran University 60 West Olsen Rd, Thousand Oaks, CA 91360 mklassen@clunet.edu

More information

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT Research Article Bioinformatics International Journal of Pharma and Bio Sciences ISSN 0975-6299 A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS D.UDHAYAKUMARAPANDIAN

More information

FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION

FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION SOMAYEH ABBASI, HAMID MAHMOODIAN Department of Electrical Engineering, Najafabad branch, Islamic

More information

Tissue Classification Based on Gene Expression Data

Tissue Classification Based on Gene Expression Data Chapter 6 Tissue Classification Based on Gene Expression Data Many diseases result from complex interactions involving numerous genes. Previously, these gene interactions have been commonly studied separately.

More information

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/

Classifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/ CSCI1950 Z Computa4onal Methods for Biology Lecture 18 Ben Raphael April 8, 2009 hip://cs.brown.edu/courses/csci1950 z/ Binary classifica,on Given a set of examples (x i, y i ), where y i = + 1, from unknown

More information

MRI Image Processing Operations for Brain Tumor Detection

MRI Image Processing Operations for Brain Tumor Detection MRI Image Processing Operations for Brain Tumor Detection Prof. M.M. Bulhe 1, Shubhashini Pathak 2, Karan Parekh 3, Abhishek Jha 4 1Assistant Professor, Dept. of Electronics and Telecommunications Engineering,

More information

CLASSIFICATION OF BRAIN TUMOUR IN MRI USING PROBABILISTIC NEURAL NETWORK

CLASSIFICATION OF BRAIN TUMOUR IN MRI USING PROBABILISTIC NEURAL NETWORK CLASSIFICATION OF BRAIN TUMOUR IN MRI USING PROBABILISTIC NEURAL NETWORK PRIMI JOSEPH (PG Scholar) Dr.Pauls Engineering College Er.D.Jagadiswary Dr.Pauls Engineering College Abstract: Brain tumor is an

More information

Predicting Kidney Cancer Survival from Genomic Data

Predicting Kidney Cancer Survival from Genomic Data Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United

More information

Gene Expression Based Leukemia Sub Classification Using Committee Neural Networks

Gene Expression Based Leukemia Sub Classification Using Committee Neural Networks Bioinformatics and Biology Insights M e t h o d o l o g y Open Access Full open access to this and thousands of other papers at http://www.la-press.com. Gene Expression Based Leukemia Sub Classification

More information

Investigating the performance of a CAD x scheme for mammography in specific BIRADS categories

Investigating the performance of a CAD x scheme for mammography in specific BIRADS categories Investigating the performance of a CAD x scheme for mammography in specific BIRADS categories Andreadis I., Nikita K. Department of Electrical and Computer Engineering National Technical University of

More information

Certificate Courses in Biostatistics

Certificate Courses in Biostatistics Certificate Courses in Biostatistics Term I : September December 2015 Term II : Term III : January March 2016 April June 2016 Course Code Module Unit Term BIOS5001 Introduction to Biostatistics 3 I BIOS5005

More information

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Implementation of Inference Engine in Adaptive Neuro Fuzzy Inference System to Predict and Control the Sugar Level in Diabetic Patient

Implementation of Inference Engine in Adaptive Neuro Fuzzy Inference System to Predict and Control the Sugar Level in Diabetic Patient , ISSN (Print) : 319-8613 Implementation of Inference Engine in Adaptive Neuro Fuzzy Inference System to Predict and Control the Sugar Level in Diabetic Patient M. Mayilvaganan # 1 R. Deepa * # Associate

More information

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool Sujata Joshi Assistant Professor, Dept. of CSE Nitte Meenakshi Institute of Technology Bangalore,

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network UTM Computing Proceedings Innovations in Computing Technology and Applications Volume 2 Year: 2017 ISBN: 978-967-0194-95-0 1 Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon

More information

A Survey on Prediction of Diabetes Using Data Mining Technique

A Survey on Prediction of Diabetes Using Data Mining Technique A Survey on Prediction of Diabetes Using Data Mining Technique K.Priyadarshini 1, Dr.I.Lakshmi 2 PG.Scholar, Department of Computer Science, Stella Maris College, Teynampet, Chennai, Tamil Nadu, India

More information

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer

A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer A Strategy for Identifying Putative Causes of Gene Expression Variation in Human Cancer Hautaniemi, Sampsa; Ringnér, Markus; Kauraniemi, Päivikki; Kallioniemi, Anne; Edgren, Henrik; Yli-Harja, Olli; Astola,

More information

Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang

Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang Abstract: Unlike most cancers, thyroid cancer has an everincreasing incidence rate

More information

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction Samuel Giftson Durai Research Scholar, Dept. of CS Bishop Heber College Trichy-17, India S. Hari Ganesh, PhD Assistant

More information

Statistical Analysis Using Machine Learning Approach for Multiple Imputation of Missing Data

Statistical Analysis Using Machine Learning Approach for Multiple Imputation of Missing Data Statistical Analysis Using Machine Learning Approach for Multiple Imputation of Missing Data S. Kanchana 1 1 Assistant Professor, Faculty of Science and Humanities SRM Institute of Science & Technology,

More information

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s Using Bayesian Networks to Analyze Expression Data Xu Siwei, s0789023 Muhammad Ali Faisal, s0677834 Tejal Joshi, s0677858 Outline Introduction Bayesian Networks Equivalence Classes Applying to Expression

More information

CANCER DIAGNOSIS USING DATA MINING TECHNOLOGY

CANCER DIAGNOSIS USING DATA MINING TECHNOLOGY CANCER DIAGNOSIS USING DATA MINING TECHNOLOGY Muhammad Shahbaz 1, Shoaib Faruq 2, Muhammad Shaheen 1, Syed Ather Masood 2 1 Department of Computer Science and Engineering, UET, Lahore, Pakistan Muhammad.Shahbaz@gmail.com,

More information

University of Cambridge Engineering Part IB Information Engineering Elective

University of Cambridge Engineering Part IB Information Engineering Elective University of Cambridge Engineering Part IB Information Engineering Elective Paper 8: Image Searching and Modelling Using Machine Learning Handout 1: Introduction to Artificial Neural Networks Roberto

More information

Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine

Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine Akshay Joshi Anum Khan Omkar Kulkarni Department of Computer Engineering Department of Computer Engineering

More information

arxiv: v2 [cs.cv] 19 Dec 2017

arxiv: v2 [cs.cv] 19 Dec 2017 An Ensemble of Deep Convolutional Neural Networks for Alzheimer s Disease Detection and Classification arxiv:1712.01675v2 [cs.cv] 19 Dec 2017 Jyoti Islam Department of Computer Science Georgia State University

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Effective Diagnosis of Alzheimer s Disease by means of Association Rules

Effective Diagnosis of Alzheimer s Disease by means of Association Rules Effective Diagnosis of Alzheimer s Disease by means of Association Rules Rosa Chaves (rosach@ugr.es) J. Ramírez, J.M. Górriz, M. López, D. Salas-Gonzalez, I. Illán, F. Segovia, P. Padilla Dpt. Theory of

More information

A HMM-based Pre-training Approach for Sequential Data

A HMM-based Pre-training Approach for Sequential Data A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University

More information

Classification of EEG signals in an Object Recognition task

Classification of EEG signals in an Object Recognition task Classification of EEG signals in an Object Recognition task Iacob D. Rus, Paul Marc, Mihaela Dinsoreanu, Rodica Potolea Technical University of Cluj-Napoca Cluj-Napoca, Romania 1 rus_iacob23@yahoo.com,

More information

Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering

Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering Bio-Medical Materials and Engineering 26 (2015) S1059 S1065 DOI 10.3233/BME-151402 IOS Press S1059 Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering Yong Xia

More information

Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department

Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department Data Mining Techniques to Find Out Heart Diseases: An Overview Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department H.V.P.M s COET, Amravati

More information

Classification and Predication of Breast Cancer Risk Factors Using Id3

Classification and Predication of Breast Cancer Risk Factors Using Id3 The International Journal Of Engineering And Science (IJES) Volume 5 Issue 11 Pages PP 29-33 2016 ISSN (e): 2319 1813 ISSN (p): 2319 1805 Classification and Predication of Breast Cancer Risk Factors Using

More information

Automated Medical Diagnosis using K-Nearest Neighbor Classification

Automated Medical Diagnosis using K-Nearest Neighbor Classification (IMPACT FACTOR 5.96) Automated Medical Diagnosis using K-Nearest Neighbor Classification Zaheerabbas Punjani 1, B.E Student, TCET Mumbai, Maharashtra, India Ankush Deora 2, B.E Student, TCET Mumbai, Maharashtra,

More information

Keywords: Leukaemia, Image Segmentation, Clustering algorithms, White Blood Cells (WBC), Microscopic images.

Keywords: Leukaemia, Image Segmentation, Clustering algorithms, White Blood Cells (WBC), Microscopic images. Volume 6, Issue 10, October 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study on

More information

Data analysis in microarray experiment

Data analysis in microarray experiment 16 1 004 Chinese Bulletin of Life Sciences Vol. 16, No. 1 Feb., 004 1004-0374 (004) 01-0041-08 100005 Q33 A Data analysis in microarray experiment YANG Chang, FANG Fu-De * (National Laboratory of Medical

More information

Classification of ECG Data for Predictive Analysis to Assist in Medical Decisions.

Classification of ECG Data for Predictive Analysis to Assist in Medical Decisions. 48 IJCSNS International Journal of Computer Science and Network Security, VOL.15 No.10, October 2015 Classification of ECG Data for Predictive Analysis to Assist in Medical Decisions. A. R. Chitupe S.

More information

Inter-session reproducibility measures for high-throughput data sources

Inter-session reproducibility measures for high-throughput data sources Inter-session reproducibility measures for high-throughput data sources Milos Hauskrecht, PhD, Richard Pelikan, MSc Computer Science Department, Intelligent Systems Program, Department of Biomedical Informatics,

More information

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines

Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology

More information

International Journal of Advance Engineering and Research Development A THERORETICAL SURVEY ON BREAST CANCER PREDICTION USING DATA MINING TECHNIQUES

International Journal of Advance Engineering and Research Development A THERORETICAL SURVEY ON BREAST CANCER PREDICTION USING DATA MINING TECHNIQUES Scientific Journal of Impact Factor (SJIF): 4.14 e-issn: 2348-4470 p-issn: 2348-6406 International Journal of Advance Engineering and Research Development Volume 4, Issue 02 February -2018 A THERORETICAL

More information

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence To understand the network paradigm also requires examining the history

More information

Asthma Surveillance Using Social Media Data

Asthma Surveillance Using Social Media Data Asthma Surveillance Using Social Media Data Wenli Zhang 1, Sudha Ram 1, Mark Burkart 2, Max Williams 2, and Yolande Pengetnze 2 University of Arizona 1, PCCI-Parkland Center for Clinical Innovation 2 {wenlizhang,

More information

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Predicting Breast Cancer Survival Using Treatment and Patient Factors Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women

More information

Automated Prediction of Thyroid Disease using ANN

Automated Prediction of Thyroid Disease using ANN Automated Prediction of Thyroid Disease using ANN Vikram V Hegde 1, Deepamala N 2 P.G. Student, Department of Computer Science and Engineering, RV College of, Bangalore, Karnataka, India 1 Assistant Professor,

More information

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION 1 R.NITHYA, 2 B.SANTHI 1 Asstt Prof., School of Computing, SASTRA University, Thanjavur, Tamilnadu, India-613402 2 Prof.,

More information

Detection of Cognitive States from fmri data using Machine Learning Techniques

Detection of Cognitive States from fmri data using Machine Learning Techniques Detection of Cognitive States from fmri data using Machine Learning Techniques Vishwajeet Singh, K.P. Miyapuram, Raju S. Bapi* University of Hyderabad Computational Intelligence Lab, Department of Computer

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the

More information

UvA-DARE (Digital Academic Repository)

UvA-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) A classification model for the Leiden proteomics competition Hoefsloot, H.C.J.; Berkenbos-Smit, S.; Smilde, A.K. Published in: Statistical Applications in Genetics

More information

An Edge-Device for Accurate Seizure Detection in the IoT

An Edge-Device for Accurate Seizure Detection in the IoT An Edge-Device for Accurate Seizure Detection in the IoT M. A. Sayeed 1, S. P. Mohanty 2, E. Kougianos 3, and H. Zaveri 4 University of North Texas, Denton, TX, USA. 1,2,3 Yale University, New Haven, CT,

More information

A Biclustering Based Classification Framework for Cancer Diagnosis and Prognosis

A Biclustering Based Classification Framework for Cancer Diagnosis and Prognosis A Biclustering Based Classification Framework for Cancer Diagnosis and Prognosis Baljeet Malhotra and Guohui Lin Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2E8

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

Variable Features Selection for Classification of Medical Data using SVM

Variable Features Selection for Classification of Medical Data using SVM Variable Features Selection for Classification of Medical Data using SVM Monika Lamba USICT, GGSIPU, Delhi, India ABSTRACT: The parameters selection in support vector machines (SVM), with regards to accuracy

More information

Analysing the Performance of Classifiers for the Detection of Skin Cancer with Dermoscopic Images

Analysing the Performance of Classifiers for the Detection of Skin Cancer with Dermoscopic Images GRD Journals Global Research and Development Journal for Engineering International Conference on Innovations in Engineering and Technology (ICIET) - 2016 July 2016 e-issn: 2455-5703 Analysing the Performance

More information

Identifying Parkinson s Patients: A Functional Gradient Boosting Approach

Identifying Parkinson s Patients: A Functional Gradient Boosting Approach Identifying Parkinson s Patients: A Functional Gradient Boosting Approach Devendra Singh Dhami 1, Ameet Soni 2, David Page 3, and Sriraam Natarajan 1 1 Indiana University Bloomington 2 Swarthmore College

More information

Rajiv Gandhi College of Engineering, Chandrapur

Rajiv Gandhi College of Engineering, Chandrapur Utilization of Data Mining Techniques for Analysis of Breast Cancer Dataset Using R Keerti Yeulkar 1, Dr. Rahila Sheikh 2 1 PG Student, 2 Head of Computer Science and Studies Rajiv Gandhi College of Engineering,

More information