Nearest Shrunken Centroid as Feature Selection of Microarray Data
|
|
- Rodney Fowler
- 5 years ago
- Views:
Transcription
1 Nearest Shrunken Centroid as Feature Selection of Microarray Data Myungsook Klassen Computer Science Department, California Lutheran University 60 West Olsen Rd, Thousand Oaks, CA Nyunsu Kim Computer Science Department, California Lutheran University 60 West Olsen Rd, Thousand Oaks, CA Abstract The nearest shrunken centroid classifier uses shrunken centroids as prototypes for each class and test samples are classified to belong to the class whose shrunken centroid is nearest to it. In our study, the nearest shrunken centroid classifier was used simply to select important genes prior to classification. Random Forest, a decision tree based classification algorithm, is chosen as a classifier to seven cancer microarray data for correct diagnosis. was also performed using the nearest shrunken centroid classifier and its results are compared to those from random Forest. Our study demonstrates that the nearest shrunken centroid classifier is simple, yet efficient in selecting important genes, but does not perform well as a classifier. We report that performance of Random Forest as a classifier is far superior to that of Shrunken centroid classifier. 1. Introduction The development of microarray technology made analysis of gene expressions possible. Analyzing gene expression data from microarray devices has many important applications in medicine and biology: the diagnosis of disease, accurate prognosis for particular patients, and understanding the response of a disease to drugs, to name a few. However the amount of data in each microarray presents significant challenges to data mining. Microarray data typically has many attributes and few samples because of the difficulty of collecting and processing samples, especially for human data. The typical number of samples is small-less than 100. In contrast, the number of attributes which represent genes in microarray data is very large-typically tens of thousands. This causes several problems. With irrelevant and redundant attributes, the data set is unnecessarily large and takes too long to perform classification. Having so many attributes and so few samples creates a high lelihood of finding false positive with over fitting. Reducing the number of attributes is an important step before applying classification methods. It should be done by preserving a maximum of discriminant information to improve the learning accuracy. Good attributes must have the same expression pattern for all samples of the same class, but different expression patterns for examples belonging to different classes. Tibshirani et al [11] proposed the nearest shrunken centroid method for class prediction in DNAmicroarray studies. It uses shrunken centroids as prototypes for each class and identifies subsets of genes that best characterize each class. It "shrinks" each of the class centroids toward the overall centroid for all classes by a threshold. This shrinkage makes the classifier more accurate by eliminating the effect of noisy genes and as a result it does automatic gene selection. The gene expression profile of a new sample is compared to each of these class centroids. The class whose centroid that it is closest to, in squared distance, is the predicted class for that new sample. This part is the same as the usual nearest centroid rule. This type of classifier is sensitive to a small disturbance and performance is inferior to contemporary classifiers such as neural networks, support vector machines, and decision trees where complex criteria is used for classification. Random Forest is a statistical method for classification and was first introduced by Leo Breiman in 2001[2]. It is a decision tree based supervised learning algorithm. Its algorithm is an ensemble classification which is unsurpassable in accuracy among current data mining algorithms. It has been used for intrusion detection[14], probability estimation[15], and classification[13], but is not widely used in the microarray data classification problems until recently. Reported results[12][13] of a Random Forest classifier with microarray data are very promising. In this paper, we make an attempt to improve microarray data classification rates. The shrunken centroid is used to extract good attributes and Random Forest is used as a classifier. results from this new architecture are compared to those obtained by solely using the shrunken centroid method as a classifier. This paper is organized as follows. We present previous relevant work in section 2. Then we discuss the concept of the Random Forest classifier and the
2 shrunken centroid in section 3. Seven microarray data sets used in this paper are described in Section 4. Numerical results are presented in Section 5, followed by discussions and conclusions in Section Related Works Random Forest has been used in several problems. Zhang and Zulkernine used it for network intrusion detection[14]. Prediction of a chemical compound s quantitative or categorical biological activity was performed by Svetn et al[13]. Random Forest performance was investigated by Klassen [9] with different parameter values, the number of trees and attributed used to split a node in a tree. Several research works were focused on reducing attributes: Genetic Algorithm[10] and wrapper approach[6], support vector machine[3], maximum lelihood[5], statistical methods[1][11], neural networks[7], and probability[15]. Hoffmann[4] used pediatric acute lymphoblastic leukemia (ALL) to determine a small set of genes for optimal subgroup distinction and achieved very high overall ALL subgroup prediction accuracies of about 98%. Uriarte et al[23] investigated the use of Random Forest for classification of nine microarray data including leukemia, colon, lymphoma, and SRBCT and reported classification rates of 94.9%, 87.3%, 99.1%, 99.23%, and 99.79% for leukemia, colon, lymphoma, prostate, and SRBCT, respectively. Prostate cancer genes expression data was tested with Multilayer Perceptron to distinguish grade levels by Moradi at el[16] and an average of 85% classification rate was reported. In Khan s work[7], the principal component analysis (PCA) with SRBCT tumor data[25] reduced the dimensionality to 10 dominant PCA components. Using these neural networks, all 20 test samples were correctly classified. Klassen[8] reported 100% classification rate with SRBCT data using Levenberg-Marquardt (LM) algorithm Back Propagation neural networks after attribute selection with the nearest shrunken centroid method. Tibshirani et al [11] reported 100% classification rate with SRBCT data and 94% with leukemia data with the nearest shrunken centroid classifier. Feng Chu [24] used support vector machine with 4 effective feature reduction methods and reported 100%, 94.12%, and 100% for SRBCT, leukemia, lymphoma respectively. 3. Background 3.1 Random Forest Random Forest is a meta-learner which consists of many individual tree. Each tree votes on an overall classification for the given set of data and the Random Forest algorithm chooses the individual classification with the most votes. There are two different sources of randomness in Random Forest: random training set(bootstrap) and random selection of attributes. Using a random selection of attributes to split each node yields favorable error rates and are more robust with respect to noise. These attributes form nodes using standard tree building methods. Diversity is obtained by randomly choosing attributes at each node of a tree and using the attributes that provide the highest level of learning. Each tree is grown to the fullest possible without pruning until no more nodes can be created due to information loss. With a lesser number of attributes used for split, correlation between any two trees decreases and the strength of a tree decreases. With a larger number of attributes, each tree strength increases. These two have a reverse effect on error rates of Random Forest: less correlation increases the error rate while less strength decreases the error rate. The optimal result should be used for the number of attributes to get good results Shrunken Centroids There are two factors to consider in selecting good genes: within class distance and between classes distance. When expression levels of a gene for all samples in the same class are fairly consistent with a small variance, but are largely different among samples of different classes, the gene is considered as a good candidate for classification. The gene has discriminant information for different classes. In the nearest shrunken centroid method, variance within a class was further taken into consideration to measure the goodness of a gene within class. The difference between a class centroid and the overall centroid for a gene is divided by the within class variance to give a greater weight to a gene whose expressions are stable among samples in the same class. A threshold value is applied to the resulting normalized class centroid differences. If it is small for all classes, it is set to zero, meaning the gene is eliminated. This reduces the number of genes that are used in the final predictive model. The mathematics involved in the shrunken centroid algorithm can be found from Tibshirani s work [11]. Assume there are n patients, p genes and K classes where s i is the within class standard deviation for gene i K si = ( xij xij ) and m n K = k k = j C n n. 0 1 k s is a small positive constant to avoid having a large d value when a gene has a small standard deviation s i. In the nearest shrunken centroid, for each gene i, d is evaluated in each class k to see if it is smaller k
3 than a chosen value, threshold Δ. If it is too small for all classes, the gene i is considered not very significant and is eliminated from the gene lists. Otherwise update d by sign( d )( d Δ ). The updated d is ' d afterwards. Once the optimal threshold is found, all centroids are called updated accordingly (shrunken centroid) using the equation ' ' = i + ( + 0) k i x x m s s d ' where d is the updated d. If a gene is shrunk to zero for all classes, then it is considered not important and is eliminated. After the centroids are determined through the training stage, classification can take place with new samples. Test samples are classified to belong to the class whose shrunken centroid is nearest to it. 4. Data SRBCT data[7]: small round blue cell tumor. It contained 88 samples from four types of cancer cells: 18 neuroblastoma (NB), 25 rhabdomyosarcoma (RMS), 29 the Ewing family of tumors (EWS), and 11 Burkitt lymphoma (BL). There are total 2287 genes. Acute data[17]: The total number of genes is 7129, and number of samples is 72, which are all acute leukemia patients, 47 acute lymphoblastic leukemia (ALL) or 28 acute myelogenous leukemia (AML). Prostate data[21]: The total number of genes is and the number of classes are two: tumor and normal. There are total 102 samples: 52 normal and 50 tumor. Lymphoma data[17]: Total number of genes to be tested is 4026 and the number of samples to be tested is 62. There are all together three types of lymphomas. The first category, Chronic Lymphocytic Lymphoma(CLL) has 11 patients, the second type Follicular Lymphoma (FL) has 42 and the third type Diffuse Large B-cell Lymphoma (DLBCL) has 9. Colon data [19]: This dataset contains 62 samples. Among them, 40 tumor biopsies are from tumors (labeled as "negative") and 22 normal (labeled as "positive") biopsies are from healthy parts of the colons of the same patients. The total number of genes to be tested is Lung cancer data[20] : There are 181 tissue samples among which 31 samples belong to MPM class and 150 belong to ADCA class. Each sample is described by genes. MLL_leukemia data[22] : contains three kinds of leukemia samples compared to the binary-class leukemia dataset. The dataset contains 72 leukemia samples with 24 ALL, 20 MLL and 28 AML. The number of genes is Experiment and Results 5.1 Data Cleaning and Set up All data sets except SRBCT which already had the train set and the test set by the author [11] is divided into a train set and a test set by reserving ratios of sample classes. Roughly 60-80% of samples are used for train and the remaining are for test. The detail breakdown for each data set is shown in Table 1. Data sets were preprocessed to get rid of redundant data and to handle missing values. The freely available shrunken centroid algorithm software PAM which was used for our experiments has a function to handle missing values. It uses the k-nearest neighborhood algorithm to find the best value to fill in missing data. The default k-value of 10 was used in our experiments. Train sample SRBCT 63(23 EWS, 8 BL, 12 NB, 20 RMS) Acute 38(27 ALL, 11 AML) Prostate 80 (40 tumor, 40 normal) Lymphoma 47( 32 DLBCL, 7FL, 8 CLL) Colon 42 (30 tumor, 16 normal) Lung 149 (134 ADCA, 15 MPM) MLL 50 (16 ALL, 14 MLL, 20 AML) Test sample 20 (6 EWS, 3 BL,6 NB,5 RMS) 34(20 ALL, 14 AML) 22 (12 tumor, 12 normal) 15 (10 DLBCL, 2FL, 3CLL) 16 (10 tumor, 6 normal) 32(16 ADCA, 16 MPM) 22 ( 5 ALL, 6 MLL, 5 AML) Table 1: Train samples and test samples with number of class members 5.2 Gene Selection with shrunken centroid the Using the shrunken centroid algorithm, train data errors are plotted against threshold values as shown in Figure 1 with colon cancer. A range of threshold values from 2.5 to 3.8 show the lowest error. The largest threshold value was chosen in our study to obtain the smallest number of attributes to be used with the
4 Random Forest classifier. In Figure 1, the threshold value was chosen for colon cancer. The same process was performed for all 7 data sets and Table 2 shows the number of genes selected. Figure 1: Colon cancer nearest shrunken centroid training errors Threshold Number of genes SRBCT Acute prostate Lymphoma Colon Lung ML Table 2: Threshold value and the number of genes selected in shrunken centroid. 5.3 with Random Forest and its results In our study, Random Forest in WEKA software developed by the University of Waato was used. There are two parameters which affect a performance of Random Forest. One is the number of attributes (called Mtry in WEKA) to select attributes at random to split a node in a tree and the other is the number of trees to be generated in a forest. The number of tree depth was set to 0 for an unlimited tree depth. Forests with three different numbers of trees,10, 15 and 20, were initially used and twenty Mtry values from 1 to 20 were used for each tree value. Since number of attributes in 7 data sets are between 5 and 52 with an average of 24, MTry values up to 20 is sufficient. When needed, the number of trees was changed to explore further. Statistics values such as average, min, max and standard deviation of classification rates along with the total number of times of 100% classification rates were obtained from 20 Mtry values were gathered. A summary of results is shown in Table 3. Table 3 shows the obtained highest classification rate and a number of trees giving the highest classification rate. If several trees were given 100% classification rate, the smallest is shown in the table. It also shows the number of times 100% classification rates occurred from 20 different Mtry values for the tree. With SRBCT and lymphoma, we were able to obtain 100% classification rate with all tree values and with several Mtry values. Mtry values play an important role while the number of trees are less so. A difference between the highest classification rate and the lowest one is larger than those from other data sets. With acute leukemia, prostate, colon and MLL_leukemia, classification rates didn t reach 100%. Random Forest with Mtry values and the number of trees didn t perform well. Also the difference between the highest classification rate and the lowest is small compared to that from SRBCT and lymphoma. The interesting case is lung cancer. The number of tree of 10 demonstrated an excellent result of 100% classification rate with almost all of Mtry values. But with tree values of 15 and 20, we were not able to get 100% classification rate. Highest rate 100% classification occurrence Tree size SRBCT out of Acute prostate Lymphoma Colon Lung 1 16 out of ML 1 1 out of Table 3: Test Sample classification rates with Random Forest 5.4 with the shrunken centroid classifier We ran the shrunken centroid classifier with the same train and test sets we used with Random Forest. Threshold values were kept the same. rates we obtained are shown in Table 4 along with those with Random Forest. Table 4 shows that the shrunken centroid classifier gave 86.6%, 93.7% and 95.4% classification rates with
5 three data sets, Lymphoma, Lung, MLL_ respectively while Random Forest produced 100% for all three. With and Prostate, both gave the same classification rates of 94.11% and 90.91% respectively. With colon cancer, the classification rate with Shrunken Centroid is 75% while 81% with Random Forest. Threshol d Shrunken Centroid rates Random Forest Classificati on rates SRBCT Acute prostate Lymphoma Colon Lung ML Table 4: Test Samples classification rates with shrunken centroid and Random Forest. 6. Evaluation and discussion rates we obtained with Random Forest are the same as or higher than those with shrunken centroid with all data sets but Acute data and prostate data. The number of genes for these two data sets are relatively small, 21 for Acute, and 6 for prostate. We further investigated how different numbers of genes affect classification rates. 6.1 A larger number of attributes for Random Forest A range of threshold values gives the largest classification rate as shown in Figure 1, and we chose a few different threshold values which were less than the largest value to generates more attributes. The idea is that more attributes may increase classification rates with the Random Forest classifier. Table 5 shows prostate classification rates increased to 96.77% with larger threshold values. With the largest value 6.33, 6 genes were selected and its classification rate was 90.91%(see Table 3). With the shrunken centroid classifier, the rate went up to 93.55% from 90.91% (see Table 3), but then went down to 83.87%. With acute leukemia, Random Forest was able to classify all test samples correctly, but the shrunken centroid classifier went down to 90% from 94.11% (see Table 3) and didn t change at all with different threshold values. The shrunken centroid classifies test samples to the nearest shrunken centroid by using a simple standardized squared distance. This works well when genes with nonzero components in each class are mutually exclusive, such as the case with SRBCT data. Otherwise using the simple distance is not a good measure for classification. A sample with average values of all genes and a sample with a few large gene values and a few small gene values may end up in the same class. The second term used in the shrunken centroid, frequency of a class in the population may not be of help when population has a similar class ratio. Threshold No. of genes Shrunken Centroid rate Random Forest rate % 96.77% (ntree=14, Mtry=2) % 96.77% (ntree=9, Mtry=4) % 96.77% (ntree=8, Mtry=5) % 93.54% (ntree=8, Mtry=2) Table 5: Prostate classification rates with different threshold values Threshold No. of genes Shrunken Centroid rate Random Forest rate % 100% ( ntree=5, Mtry=3) % 95% ( ntree=5, Mtry=3) % 100% ( ntree=6, Mtry=4) % 95% ( ntree=6, Mtry=2) Table 6:Acute classification rates with different threshold values Random Forest uses decision-tree based supervised learning algorithm and is a meta learner to make a classification decision collectively based on each tree vote. Problems mentioned above with the shrunken centroid will not take place. Genes selected for
6 splitting a node in a tree will properly assign sample into a right decision node. 6.2 k-nearest neighbor algorithm for handling Missing values the Preprocessing data is an important part of data mining and can affect training and testing processes. We explored a goodness of k-nearest neighbor to fill in a missing value. We chose 62 samples with no missing values from the Colon data. We chose one gene T53360 and deleted values , and from randomly selected samples sample 5 (tumor), and sample 48(normal) respectively. We filled in missing values with k-values from 1 to 10. The k value giving the best matching with the original value we deleted is different for each class. Therefore an average from all 10 k-values is used and is shown in Table 7. The table shows that k-nearest neighbor computes much closer values to the original values than a simple average method does. A similar result was obtain with Lymphoma. Sample 5(tumor) Sample 48(normal) Original value deleted k-nearest neighbor average Simple average Table 7: Effectiveness of the k-nearest neighbor method handling missing values with colon cancer. 6. References [1] E. Blair, and R. Tibshirani, Machine learning methods applied to DNA microarray data can improve the diagnosis of cancer. SIGKDD Explorations, Vol. 5,Issue 2, 2002, pp [2] L. Breiman, Random Forest, Machine Learning, 45(1), 2001, pp5-32. [3] I. Guyon, Gene selection for cancer classification using support vector machines, Machine Learning, Vol. 46, 2002, pp [4] K. Hoffmann, Translating microarray data for diagnostic testing in childhood leukaemia, BMC Cancer, :229. [5] T. Ideker, Testing for differentially-expressed genes by maximum lelihood analysis of microarray data, Journal of Computational Biology, Vol. 7, 2000, pp [6] Inza, Gene selection by sequential wrapper approaches in microarray cancer class prediction, Journal of Intelligent and Fuzzy Systems, 2002, pp [7] J. Khan, and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, Vol. 7, No. 6, 2001, pp [8] M. Klassen, of Cancer Microarray data using neural network, Proceedings of IADIS International Conference Applied Computing, 2007 [9] M. Klassen, M.Cummings, G. Seldana, Investigation of Random Forest performance with cancer microarray data. Proceedings of ISCA 24th International Conference ON Computers and Their Applications [10] L. Li, Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method, Combinatorial Chemistry and High Throughput Screening. 2001, pp [11] R. Tibshirani, Dignosis of multiple cancer types by shrunken centroids of gene expression, PNAS, Vol. 99, No. 10, 2002, pp [12] T. Shi, Tumor classification by tissue microarray profiling:random Forest clustering applied to renal cell carcinoma, Modern Pathology. Vol. 18, 2005, pp [13] V. Svetn, Random Forest: A and Regression Tool for compound classification and QSAR modeling, J. Chem. Inf. Computer Science, 43, 2003, pp [14] J. Zhang and M. Zulkernine, A Hybrid Network Intrusion Detection Technique Using Random Forest, Proceedings of the First International Conference on Availability, Reliability and Security (ARES'06), Vol., , pp [15] Ting-Fan Wu, Chih-Jen Lin and Ruby C. Weng, probability estimates for multi-class classification by pair wise coupling, The journal of machine learning research, Volume 5, [16] M. Moradi, P. Mousavi, and P. Abolmaesoumi. pathological distinction of prostate cancer tumors based on DNA microarray data. CSCBC 2006 conference proceedings.ontario, Canada. [17] Goloub, et al. Molecular of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 15 October 1999.Vol no. 5439, pp [18] Alizadeh, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature Feb 3;403(6769): [19] Alon, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues. Proceedings of the National Academy of Sciences, [20] Gordon, et al. Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma. Cancer Research, 2002 AACR. [21] Singh, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell Mar;1(2): [22] Armstrong, et al. A gene expression profile analysis of acute lymphoblastic leukemia suggests a new subset of leukemia. The Scientist 2001, 2(1): [23] Ramon Uriarte and Sara Andres. gene selection and classification of microarray data using Random Forest BMC Bioinformatics Vol 7, No 3. [24] Feng Chu and Lipo Wang. Application of support vector machines to cancer classification with microarray data. International Journal of Neural systems, 2005, Vol 5, No 6.
Diagnosis of multiple cancer types by shrunken centroids of gene expression
Diagnosis of multiple cancer types by shrunken centroids of gene expression Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu PNAS 99:10:6567-6572, 14 May 2002 Nearest Centroid
More informationGene Selection for Tumor Classification Using Microarray Gene Expression Data
Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology
More informationComparison of discrimination methods for the classification of tumors using gene expression data
Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley
More informationColon cancer subtypes from gene expression data
Colon cancer subtypes from gene expression data Nathan Cunningham Giuseppe Di Benedetto Sherman Ip Leon Law Module 6: Applied Statistics 26th February 2016 Aim Replicate findings of Felipe De Sousa et
More informationT. R. Golub, D. K. Slonim & Others 1999
T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have
More informationPredicting Breast Cancer Survivability Rates
Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer
More informationPredicting Breast Cancer Survival Using Treatment and Patient Factors
Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women
More informationResearch Supervised clustering of genes Marcel Dettling and Peter Bühlmann
http://genomebiology.com/22/3/2/research/69. Research Supervised clustering of genes Marcel Dettling and Peter Bühlmann Address: Seminar für Statistik, Eidgenössische Technische Hochschule (ETH) Zürich,
More informationStatistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.
Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension
More informationEfficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection
202 4th International onference on Bioinformatics and Biomedical Technology IPBEE vol.29 (202) (202) IASIT Press, Singapore Efficacy of the Extended Principal Orthogonal Decomposition on DA Microarray
More informationClass discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines
Class discovery in Gene Expression Data: Characterizing Splits by Support Vector Machines Florian Markowetz and Anja von Heydebreck Max-Planck-Institute for Molecular Genetics Computational Molecular Biology
More informationEfficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features
American Journal of Applied Sciences 8 (12): 1295-1301, 2011 ISSN 1546-9239 2011 Science Publications Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine
More informationData complexity measures for analyzing the effect of SMOTE over microarrays
ESANN 216 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 27-29 April 216, i6doc.com publ., ISBN 978-2878727-8. Data complexity
More informationCancer Gene Extraction Based on Stepwise Regression
Mathematical Computation Volume 5, 2016, PP.6-10 Cancer Gene Extraction Based on Stepwise Regression Jie Ni 1, Fan Wu 1, Meixiang Jin 1, Yixing Bai 1, Yunfei Guo 1 1. Mathematics Department, Yanbian University,
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017
RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science
More informationGene Expression Based Leukemia Sub Classification Using Committee Neural Networks
Bioinformatics and Biology Insights M e t h o d o l o g y Open Access Full open access to this and thousands of other papers at http://www.la-press.com. Gene Expression Based Leukemia Sub Classification
More informationMachine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017
Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining
More informationMACHINE LEARNING BASED APPROACHES FOR PREDICTION OF PARKINSON S DISEASE
Abstract MACHINE LEARNING BASED APPROACHES FOR PREDICTION OF PARKINSON S DISEASE Arvind Kumar Tiwari GGS College of Modern Technology, SAS Nagar, Punjab, India The prediction of Parkinson s disease is
More informationEvaluating Classifiers for Disease Gene Discovery
Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics
More informationPredicting Kidney Cancer Survival from Genomic Data
Predicting Kidney Cancer Survival from Genomic Data Christopher Sauer, Rishi Bedi, Duc Nguyen, Benedikt Bünz Abstract Cancers are on par with heart disease as the leading cause for mortality in the United
More informationInternational Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT
Research Article Bioinformatics International Journal of Pharma and Bio Sciences ISSN 0975-6299 A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS D.UDHAYAKUMARAPANDIAN
More informationAlgorithms Implemented for Cancer Gene Searching and Classifications
Algorithms Implemented for Cancer Gene Searching and Classifications Murad M. Al-Rajab and Joan Lu School of Computing and Engineering, University of Huddersfield Huddersfield, UK {U1174101,j.lu}@hud.ac.uk
More informationApplication of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures
Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures 1 2 3 4 5 Kathleen T Quach Department of Neuroscience University of California, San Diego
More informationAn Improved Algorithm To Predict Recurrence Of Breast Cancer
An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant
More informationHybridized KNN and SVM for gene expression data classification
Mei, et al, Hybridized KNN and SVM for gene expression data classification Hybridized KNN and SVM for gene expression data classification Zhen Mei, Qi Shen *, Baoxian Ye Chemistry Department, Zhengzhou
More informationA Biclustering Based Classification Framework for Cancer Diagnosis and Prognosis
A Biclustering Based Classification Framework for Cancer Diagnosis and Prognosis Baljeet Malhotra and Guohui Lin Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2E8
More informationSupporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction
Supporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction Amir Barati Farimani, Mohammad Heiranian, Narayana R. Aluru 1 Department
More informationClassifica4on. CSCI1950 Z Computa4onal Methods for Biology Lecture 18. Ben Raphael April 8, hip://cs.brown.edu/courses/csci1950 z/
CSCI1950 Z Computa4onal Methods for Biology Lecture 18 Ben Raphael April 8, 2009 hip://cs.brown.edu/courses/csci1950 z/ Binary classifica,on Given a set of examples (x i, y i ), where y i = + 1, from unknown
More informationPredictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts
Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts Kinar, Y., Kalkstein, N., Akiva, P., Levin, B., Half, E.E., Goldshtein, I., Chodick, G. and Shalev,
More informationAccurate molecular classification of cancer using simple rules.
University of Nebraska Medical Center DigitalCommons@UNMC Journal Articles: Genetics, Cell Biology & Anatomy Genetics, Cell Biology & Anatomy 10-30-2009 Accurate molecular classification of cancer using
More informationClassification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang
Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression
More informationA hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system
Technology and Health Care 24 (2016) S601 S605 DOI 10.3233/THC-161187 IOS Press S601 A hierarchical two-phase framework for selecting genes in cancer datasets with a neuro-fuzzy system Jongwoo Lim, Bohyun
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationPackage propoverlap. R topics documented: February 20, Type Package
Type Package Package propoverlap February 20, 2015 Title Feature (gene) selection based on the Proportional Overlapping Scores Version 1.0 Date 2014-09-15 Author Osama Mahmoud, Andrew Harrison, Aris Perperoglou,
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 8, Issue 1 2009 Article 13 Detecting Outlier Samples in Microarray Data Albert D. Shieh Yeung Sam Hung Harvard University, shieh@fas.harvard.edu
More informationAn Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM)
www.ijcsi.org 8 An Efficient Diseases Classifier based on Microarray Datasets using Clustering ANOVA Extreme Learning Machine (CAELM) Shamsan Aljamali 1, Zhang Zuping 2 and Long Jun 3 1 School of Information
More informationAugmented Medical Decisions
Machine Learning Applied to Biomedical Challenges 2016 Rulex, Inc. Intelligible Rules for Reliable Diagnostics Rulex is a predictive analytics platform able to manage and to analyze big amounts of heterogeneous
More informationPrediction of heart disease using k-nearest neighbor and particle swarm optimization.
Biomedical Research 2017; 28 (9): 4154-4158 ISSN 0970-938X www.biomedres.info Prediction of heart disease using k-nearest neighbor and particle swarm optimization. Jabbar MA * Vardhaman College of Engineering,
More informationRoadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:
Roadmap for Developing and Validating Therapeutically Relevant Genomic Classifiers. Richard Simon, J Clin Oncol 23:7332-7341 Presented by Deming Mi 7/25/2006 Major reasons for few prognostic factors to
More informationA Comparison of Collaborative Filtering Methods for Medication Reconciliation
A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,
More informationKeywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset.
Volume 7, Issue 3, March 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Medoid Based Approach
More informationL. Ziaei MS*, A. R. Mehri PhD**, M. Salehi PhD***
Received: 1/16/2004 Accepted: 8/1/2005 Original Article Application of Artificial Neural Networks in Cancer Classification and Diagnosis Prediction of a Subtype of Lymphoma Based on Gene Expression Profile
More informationThe Analysis of Proteomic Spectra from Serum Samples. Keith Baggerly Biostatistics & Applied Mathematics MD Anderson Cancer Center
The Analysis of Proteomic Spectra from Serum Samples Keith Baggerly Biostatistics & Applied Mathematics MD Anderson Cancer Center PROTEOMICS 1 What Are Proteomic Spectra? DNA makes RNA makes Protein Microarrays
More informationChapter 1. Introduction
Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a
More informationPredictive Biomarkers
Uğur Sezerman Evolutionary Selection of Near Optimal Number of Features for Classification of Gene Expression Data Using Genetic Algorithms Predictive Biomarkers Biomarker: A gene, protein, or other change
More informationGene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering
Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene
More informationINTRODUCTION TO MACHINE LEARNING. Decision tree learning
INTRODUCTION TO MACHINE LEARNING Decision tree learning Task of classification Automatically assign class to observations with features Observation: vector of features, with a class Automatically assign
More informationSVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis
SVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis Walaa Gad Faculty of Computers and Information Sciences Ain Shams University Cairo, Egypt Email: walaagad [AT]
More informationRajiv Gandhi College of Engineering, Chandrapur
Utilization of Data Mining Techniques for Analysis of Breast Cancer Dataset Using R Keerti Yeulkar 1, Dr. Rahila Sheikh 2 1 PG Student, 2 Head of Computer Science and Studies Rajiv Gandhi College of Engineering,
More informationPerformance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool
Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool Sujata Joshi Assistant Professor, Dept. of CSE Nitte Meenakshi Institute of Technology Bangalore,
More informationMulticlass microarray data classification based on confidence evaluation
Methodology Multiclass microarray data classification based on confidence evaluation H.L. Yu 1, S. Gao 1, B. Qin 1 and J. Zhao 2 1 School of Computer Science and Engineering, Jiangsu University of Science
More informationA Random Forest Model for the Analysis of Chemical Descriptors for the Elucidation of HIV 1 Protease Protein Ligand Interactions
A Random Forest Model for the Analysis of Chemical Descriptors for the Elucidation of HIV 1 Protease Protein Ligand Interactions Gene M. Ko, A. Srinivas Reddy, Sunil Kumar, Barbara A. Bailey, and Rajni
More informationFUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION
FUZZY C-MEANS AND ENTROPY BASED GENE SELECTION BY PRINCIPAL COMPONENT ANALYSIS IN CANCER CLASSIFICATION SOMAYEH ABBASI, HAMID MAHMOODIAN Department of Electrical Engineering, Najafabad branch, Islamic
More informationIntroduction to Discrimination in Microarray Data Analysis
Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t
More informationPositive and Unlabeled Relational Classification through Label Frequency Estimation
Positive and Unlabeled Relational Classification through Label Frequency Estimation Jessa Bekker and Jesse Davis Computer Science Department, KU Leuven, Belgium firstname.lastname@cs.kuleuven.be Abstract.
More informationCase Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD
Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review
More informationClassification of cancer profiles. ABDBM Ron Shamir
Classification of cancer profiles 1 Background: Cancer Classification Cancer classification is central to cancer treatment; Traditional cancer classification methods: location; morphology, cytogenesis;
More informationPredicting Juvenile Diabetes from Clinical Test Results
2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006 Predicting Juvenile Diabetes from Clinical Test Results Shibendra Pobi
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationPositive and Unlabeled Relational Classification through Label Frequency Estimation
Positive and Unlabeled Relational Classification through Label Frequency Estimation Jessa Bekker and Jesse Davis Computer Science Department, KU Leuven, Belgium firstname.lastname@cs.kuleuven.be Abstract.
More informationImpute vs. Ignore: Missing Values for Prediction
Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Impute vs. Ignore: Missing Values for Prediction Qianyu Zhang, Ashfaqur Rahman, and Claire D Este
More informationVariable Features Selection for Classification of Medical Data using SVM
Variable Features Selection for Classification of Medical Data using SVM Monika Lamba USICT, GGSIPU, Delhi, India ABSTRACT: The parameters selection in support vector machines (SVM), with regards to accuracy
More informationA NEW DIAGNOSIS SYSTEM BASED ON FUZZY REASONING TO DETECT MEAN AND/OR VARIANCE SHIFTS IN A PROCESS. Received August 2010; revised February 2011
International Journal of Innovative Computing, Information and Control ICIC International c 2011 ISSN 1349-4198 Volume 7, Number 12, December 2011 pp. 6935 6948 A NEW DIAGNOSIS SYSTEM BASED ON FUZZY REASONING
More informationBiomarker adaptive designs in clinical trials
Review Article Biomarker adaptive designs in clinical trials James J. Chen 1, Tzu-Pin Lu 1,2, Dung-Tsa Chen 3, Sue-Jane Wang 4 1 Division of Bioinformatics and Biostatistics, National Center for Toxicological
More informationMinimum Feature Selection for Epileptic Seizure Classification using Wavelet-based Feature Extraction and a Fuzzy Neural Network
Appl. Math. Inf. Sci. 8, No. 3, 129-1300 (201) 129 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.1278/amis/0803 Minimum Feature Selection for Epileptic Seizure
More informationAUTOMATING NEUROLOGICAL DISEASE DIAGNOSIS USING STRUCTURAL MR BRAIN SCAN FEATURES
AUTOMATING NEUROLOGICAL DISEASE DIAGNOSIS USING STRUCTURAL MR BRAIN SCAN FEATURES ALLAN RAVENTÓS AND MOOSA ZAIDI Stanford University I. INTRODUCTION Nine percent of those aged 65 or older and about one
More informationBivariate variable selection for classification problem
Bivariate variable selection for classification problem Vivian W. Ng Leo Breiman Abstract In recent years, large amount of attention has been placed on variable or feature selection in various domains.
More informationBrain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Brain Tumour Detection of MR Image Using Naïve
More informationPREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH
PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH 1 VALLURI RISHIKA, M.TECH COMPUTER SCENCE AND SYSTEMS ENGINEERING, ANDHRA UNIVERSITY 2 A. MARY SOWJANYA, Assistant Professor COMPUTER SCENCE
More informationA NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION
OPT-i An International Conference on Engineering and Applied Sciences Optimization M. Papadrakakis, M.G. Karlaftis, N.D. Lagaros (eds.) Kos Island, Greece, 4-6 June 2014 A NOVEL VARIABLE SELECTION METHOD
More informationIncreasing Efficiency of Microarray Analysis by PCA and Machine Learning Methods
56 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 Increasing Efficiency of Microarray Analysis by PCA and Machine Learning Methods Jing Sun 1, Kalpdrum Passi 1, Chakresh Jain 2 1 Department
More informationInter-session reproducibility measures for high-throughput data sources
Inter-session reproducibility measures for high-throughput data sources Milos Hauskrecht, PhD, Richard Pelikan, MSc Computer Science Department, Intelligent Systems Program, Department of Biomedical Informatics,
More informationDiscovering Meaningful Cut-points to Predict High HbA1c Variation
Proceedings of the 7th INFORMS Workshop on Data Mining and Health Informatics (DM-HI 202) H. Yang, D. Zeng, O. E. Kundakcioglu, eds. Discovering Meaningful Cut-points to Predict High HbAc Variation Si-Chi
More informationTissue Classification Based on Gene Expression Data
Chapter 6 Tissue Classification Based on Gene Expression Data Many diseases result from complex interactions involving numerous genes. Previously, these gene interactions have been commonly studied separately.
More informationKnowledge Discovery and Data Mining I
Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining Prof. Dr. Thomas Seidl Knowledge Discovery and Data Mining I Winter Semester 2018/19 Introduction What is an outlier?
More informationTITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)
TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition
More informationBreast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information
Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information Abeer Alzubaidi abeer.alzubaidi022014@my.ntu.ac.uk David Brown david.brown@ntu.ac.uk Abstract
More informationNAÏVE BAYESIAN CLASSIFIER FOR ACUTE LYMPHOCYTIC LEUKEMIA DETECTION
NAÏVE BAYESIAN CLASSIFIER FOR ACUTE LYMPHOCYTIC LEUKEMIA DETECTION Sriram Selvaraj 1 and Bommannaraja Kanakaraj 2 1 Department of Biomedical Engineering, P.S.N.A College of Engineering and Technology,
More informationarxiv: v2 [cs.cv] 8 Mar 2018
Automated soft tissue lesion detection and segmentation in digital mammography using a u-net deep learning network Timothy de Moor a, Alejandro Rodriguez-Ruiz a, Albert Gubern Mérida a, Ritse Mann a, and
More informationMethods for Predicting Type 2 Diabetes
Methods for Predicting Type 2 Diabetes CS229 Final Project December 2015 Duyun Chen 1, Yaxuan Yang 2, and Junrui Zhang 3 Abstract Diabetes Mellitus type 2 (T2DM) is the most common form of diabetes [WHO
More informationREINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE
REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE REINVENTING THE BIOMARKER PANEL DISCOVERY EXPERIENCE 1 Biomarker discovery has opened new realms in the medical industry, from patient diagnosis and
More informationCANCER CLASSIFICATION USING SINGLE GENES
179 CANCER CLASSIFICATION USING SINGLE GENES XIAOSHENG WANG 1 OSAMU GOTOH 1,2 david@genome.ist.i.kyoto-u.ac.jp o.gotoh@i.kyoto-u.ac.jp 1 Department of Intelligence Science and Technology, Graduate School
More informationCANCER PREDICTION SYSTEM USING DATAMINING TECHNIQUES
CANCER PREDICTION SYSTEM USING DATAMINING TECHNIQUES K.Arutchelvan 1, Dr.R.Periyasamy 2 1 Programmer (SS), Department of Pharmacy, Annamalai University, Tamilnadu, India 2 Associate Professor, Department
More informationPredicting Breast Cancer Recurrence Using Machine Learning Techniques
Predicting Breast Cancer Recurrence Using Machine Learning Techniques Umesh D R Department of Computer Science & Engineering PESCE, Mandya, Karnataka, India Dr. B Ramachandra Department of Electrical and
More informationPrediction-based Threshold for Medication Alert
MEDINFO 2013 C.U. Lehmann et al. (Eds.) 2013 IMIA and IOS Press. This article is published online with Open Access IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial
More informationInvestigating Links Between the Immune System and the Brain from Medical Claims and Laboratory Tests
Investigating Links Between the Immune System and the Brain from Medical Claims and Laboratory Tests Guhan Venkataraman Department of Biomedical Informatics guhan@stanford.edu Tymor Hamamsy Department
More informationDetection of Cognitive States from fmri data using Machine Learning Techniques
Detection of Cognitive States from fmri data using Machine Learning Techniques Vishwajeet Singh, K.P. Miyapuram, Raju S. Bapi* University of Hyderabad Computational Intelligence Lab, Department of Computer
More informationClassification of Smoking Status: The Case of Turkey
Classification of Smoking Status: The Case of Turkey Zeynep D. U. Durmuşoğlu Department of Industrial Engineering Gaziantep University Gaziantep, Turkey unutmaz@gantep.edu.tr Pınar Kocabey Çiftçi Department
More informationCANCER DIAGNOSIS USING DATA MINING TECHNOLOGY
CANCER DIAGNOSIS USING DATA MINING TECHNOLOGY Muhammad Shahbaz 1, Shoaib Faruq 2, Muhammad Shaheen 1, Syed Ather Masood 2 1 Department of Computer Science and Engineering, UET, Lahore, Pakistan Muhammad.Shahbaz@gmail.com,
More informationUvA-DARE (Digital Academic Repository)
UvA-DARE (Digital Academic Repository) A classification model for the Leiden proteomics competition Hoefsloot, H.C.J.; Berkenbos-Smit, S.; Smilde, A.K. Published in: Statistical Applications in Genetics
More informationWhite Paper Estimating Complex Phenotype Prevalence Using Predictive Models
White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015
More informationCOMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) DETECTION OF ACUTE LEUKEMIA USING WHITE BLOOD CELLS SEGMENTATION BASED ON BLOOD SAMPLES
International INTERNATIONAL Journal of Electronics JOURNAL and Communication OF ELECTRONICS Engineering & Technology AND (IJECET), COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) ISSN 0976 6464(Print)
More informationInferring Biological Meaning from Cap Analysis Gene Expression Data
Inferring Biological Meaning from Cap Analysis Gene Expression Data HRYSOULA PAPADAKIS 1. Introduction This project is inspired by the recent development of the Cap analysis gene expression (CAGE) method,
More informationABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Data Mining Techniques to Predict Cancer Diseases
More informationClassification of Patients Treated for Infertility Using the IVF Method
STUDIES IN LOGIC, GRAMMAR AND RHETORIC 43(56) 2015 DOI: 10.1515/slgr-2015-0041 Classification of Patients Treated for Infertility Using the IVF Method PawełMalinowski 1,RobertMilewski 1,PiotrZiniewicz
More informationMinority Report: ML Fairness in Criminality Prediction
Minority Report: ML Fairness in Criminality Prediction Dominick Lim djlim@stanford.edu Torin Rudeen torinmr@stanford.edu 1. Introduction 1.1. Motivation Machine learning is used more and more to make decisions
More informationUsing CART to Mine SELDI ProteinChip Data for Biomarkers and Disease Stratification
Using CART to Mine SELDI ProteinChip Data for Biomarkers and Disease Stratification Kenna Mawk, D.V.M. Informatics Product Manager Ciphergen Biosystems, Inc. Outline Introduction to ProteinChip Technology
More informationA Hybrid Approach for Mining Metabolomic Data
A Hybrid Approach for Mining Metabolomic Data Dhouha Grissa 1,3, Blandine Comte 1, Estelle Pujos-Guillot 2, and Amedeo Napoli 3 1 INRA, UMR1019, UNH-MAPPING, F-63000 Clermont-Ferrand, France, 2 INRA, UMR1019,
More informationMulti Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *
Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 * Department of CSE, Kurukshetra University, India 1 upasana_jdkps@yahoo.com Abstract : The aim of this
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More information