Machine learning for HIV-1 protease cleavage site prediction

Size: px
Start display at page:

Download "Machine learning for HIV-1 protease cleavage site prediction"

Transcription

1 Pattern Recognition Letters 27 (2006) Machine learning for HIV-1 protease cleavage site prediction Alessandra Lumini, Loris Nanni * DEIS, IEIIT CNR, Università di Bologna, Viale Risorgimento 2, Bologna, Italy Received 17 November 2004; received in revised form 16 May 2005 Available online 2 May 2006 Communicated by L. Goldfarb Abstract Recently, several works have approached the HIV-1 protease specificity problem by applying a number of classifier creation and combination methods, known as ensemble methods, from the field of machine learning. However, it is still difficult for researchers to choose the best method due to the lack of an effective comparison. For the first time we have made an extensive study on methods for feature extraction, feature transformation and multiclassifier systems (MCS) in the problem of HIV-1 protease. In this work we report an experimental comparison on several learning systems coupled with different feature representations. We confirm previous results stating that linear classifiers obtain higher performance than non-linear classifiers using orthonormal encoding, but we also show that using Karhunen Loeve transform the performance of neural networks are comparable to one of linear support vector machines. Finally we propose a new hierarchical approach that, for the first time, combines ideas derived from the machine learning methodologies and from a knowledge base of this particular problem. This approach proves to be a successful attempt to obtain a drastically error reduction with respect to the performance of linear classifiers: the error rate decreases from 9.1% using linear-svm to 6.6% using our new hierarchical classifier based on some pattern rules. Ó 2006 Elsevier B.V. All rights reserved. Keywords: HIV-1 protease; Karhunen Loeve transform; Hierarchical approach 1. Introduction HIV-1 protease (Beck et al., 2000) is an enzyme in the AIDS virus that is essential to its replication. The chemical action of the protease takes place at a localized active site on its surface. HIV-1 protease inhibitor drugs are small molecules that bind to the active site in HIV-1 protease and stay there, so that the normal functioning of the enzyme is prevented. Understanding and predicting HIV-1 protease cleavage sites in proteins is a very important topic, since cleaved substrates are also templates for synthesis of tightly binding chemically modified inhibitors. The standard paradigm * Corresponding author. Fax: addresses: alumini@deis.unibo.it (A. Lumini), lnanni@deis. unibo.it (L. Nanni). for protease peptide interaction is the lock and key model. In this model a sequence of amino acids fits as a key to the active site in the protease, which is eight-residues long in the HIV-1 protease case. In order to design effective HIV protease inhibitors, accurately identifying cleaved peptide of eight residues is very crucial. The potential number of solutions is 20 8 as there are 20 amino acids. This makes an accurate and rapid method for predicting HIV protease (Chou, 1993a,b,c; Cai and Chou, 1998; Chou et al., 1993, 1996; Chou and Zhang, 1992) very helpful, since an exhaustive experimental search is impossible. The interested reader can see (Chou, 1996) for a good review. In order to approach this problem it is important to know that in HIV-1 protease only one class (the uncleaved category) is shift invariant, the other class is not. Shift invariance means that a category remains unchanged if a /$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi: /j.patrec

2 1538 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) pattern is shifted left or right of one position. For instance, the peptide DDFGRCELAAAMKRHGLHL is not cleaved by HIV-1 protease, which means, due to the shift invariance, that all the octamers DDFGRCEL,..., MKRHGLHL belong to the same uncleaved category. On the contrary, the cleaved category is not shift invariant, because the cleaving occurs at one specificity site and not in nearby sites. A machine learning algorithm is one that can learn from experience (observed examples) with respect to some class of tasks and a performance measure. Machine learning methods are suitable for molecular biology data due to the learning algorithm s ability to construct classifiers/hypotheses that can explain complex relationships in the data. Recently, several works have approached the HIV-1 protease specificity problem by applying techniques from machine learning. In (Cai et al., 1998; Thompson et al., 1995) the authors used a standard feedforward multilayer perceptron (MLP) to solve this problem, achieving an error rate of 12%. In (Cai and Chou, 1998) the authors confirm the result of (Narayanan et al., 2002; Thompson et al., 1995) using the same data and the same MLP architecture, showing that a decision tree was not able to predict the cleavage as well as MLP. Recently in (Cai et al., 1998) Support Vector Machines (SVM) have been adopted to predict the cleavage. In (Rögnvaldsson and You, 2003) the authors showed that HIV-1 protease cleavage is a linear problem and that the best classifier for this problem is linear-svm (L-SVM). Multiclassifier systems (Dietterich, 2000; Masulli and Valentini, 2000; Mayoraz and Moreira, 1997) integrate several data-driven models for the same problem; with the aim of obtaining a better composite global model, with more accurate and reliable estimates. In addition, modular approaches often decompose a complex problem into sub-problems for which the solutions obtained are simpler to understand, as well as to implement, manage and update. Some works combined the output of various classifiers in bioinformatics problems: in (Tan and Gilbert, 2003) the authors compared three ensemble methods (stacking, bagging and boosting) showing that combined methods perform better than the individual learners. In this paper we confute these results: we believe that the behavior exposed in (Tan and Gilbert, 2003) was due to the low performance of single classifiers adopted (even if they used a larger training set, obtained by a ten fold cross-validation). In this work we first perform a comparison of several machine learning approaches applied to the problem of HIV-1 protease, then we show how to develop a new hierarchical classifier (HC) architecture by merging some ideas derived from the study of machine learning methodologies and from a knowledge base of this particular problem. The experiments show that the HIV-1 problem can be effectively solved using our hierarchical classifier: this approach (HC) yields an error rate of 6.6%, which is very lower than the best previous approaches (9.1% using linear-svm). Even if some of the rules used (Rögnvaldsson and You, 2003) were found by looking at all the patterns of the dataset, so the performance of HC is partially biased, in our opinion it is very interesting to show that merging some ideas derived from the study of machine learning methodologies and from a knowledge base of this particular problem the error rate is drastically reduced. In fact, using Q-statistic we show that the independence between a machine learning classifier as LDC and a set of mining rules based on pattern motifs is lower than those obtained for any combination of machine learning classifiers. This means that the two approaches, based on different methodologies, are enough different that they can be coupled to improve the accuracy. 2. Methods In this section a brief description of the feature extraction methodologies, feature transformations, classifiers and ensemble methods combined and tested in this work is given Feature extraction (FE) Feature extraction is a process that extracts a set of features from the original pattern representation through some functional mapping. The most used features in this field are Peptide sequences. A protein sequence is made from combinations of variable length of 20 amino acids P ={A,C,D,...,V, W, Y}. A peptide (small protein) is denoted by P ¼ P 4 P 3 P 2 P 1 1 0P 2 0P 3 0P 4 0. Where P i is an amino acid belonging to P. The scissile bond is located between positions P 1 and P 1 0.Asin(Rögnvaldsson and You, 2003) P 3 and P 4 0 are not used. Orthonormal encoding (OE). It is the standard procedure (Rögnvaldsson and You, 2003) to map the sequence P to a sparse orthonormal representation. Each amino acid P i is then represented by a 20 bit vector with 19 bits set to zero and one bit set to one, and each amino acid vector is orthogonal to all other amino acid vectors. P i can take on any one of the twenty amino acid values. n-grams (NG). The n-grams or k-tuples (Wu et al., 1992) are a pair of values (v i,c i ), where v i is the feature and c i is the counts of this feature in a protein sequence for i =1,...,20 n. These features are all the possible combinations of n letters from the set P. The 6-letter exchange group is another commonly used piece of information. The 6-letter group contains six combinations of the letters. These combinations are A = {H, R, K}, B = {D, E, N,Q}, C = {C}, D = {S,T, P,A,G}, E = {M,I,L,V} and F = {F,Y, W}. Each set of n-grams features from a protein sequence can be scaled using: x x ¼ L n 1

3 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) where x represents the count of a generic gram feature, L is the length of the protein sequence and n is the size of n-gram features. insertion or deletion of a letter. The edit distance is coupled with a nearest-neighbor classifier in order to classify a new pattern Feature transformation (FT) Feature transformation is a process through which a new set of features is created from an existing one represented in a vector space R N. Karhunen Loeve transform (KL) (Duda et al., 2001). This transform projects high dimensional data onto a lower-dimensional subspace in a way that is optimal in a sum-squared sense. It has known that KL is the best linear transform for dimensionality reduction. In this paper, we use KL to reduce the original dataset to 50 dimensions. Independent component analysis (ICA) (Duda et al., 2001). This transform seeks the directions in feature space that show the independence of signals. KernelPCA (Scholkopf et al., 1998). Each feature vector is first projected from the input space to a high dimensional feature space by a non-linear map, then a pattern in the high dimensional space is reduced to a lower-dimensionality by KL. This transform is useful when the feature space is non-linear Classifiers A classifier is a component that uses the feature vector provided by the feature extraction or transformation to assign a pattern to a class. Linear discriminant classifier (LDC) (Duda et al., 2001). The linear discriminant analysis method consists of searching some linear combinations of selected variables, which provide the best separation between the considered classes. Linear-SVM (L-SVM) (Duda et al., 2001). The goal of this two-class classifier is to establish the equation of a hyperplane that divides the training set leaving all the points of the same class on the same side, while maximizing the distance between the two classes and the hyperplane. Multilayer perceptrons (MLP) (Duda et al., 2001). Multilayer perceptrons are supervised feedforward neural networks trained with the standard back-propagation algorithm. With one or two hidden layers, they can approximate virtually any input output map, so they are widely used for pattern classification. Edit distance classifier (EDC) (Levenshtein, 1965). The edit distance of two strings, s1 and s2, is defined as the minimum number of point mutations required to change s1 into s2, where a point mutation is a change, 2.4. Pattern motifs Studying the peptide sequence we cannot that particular combinations of amino acids influence the cleaving/noncleaving decision. (i.e. if the third amino acid of the peptide sequences is Glutamine we know that there is a low possibility that this peptide is a cleavage site). Pattern motifs (Rögnvaldsson and You, 2003) can be used for creating a rule-based classifier Multiclassifier systems (MCS) Multiclassifier systems combine different approaches to solve the same problem. They combine, by a decision rule, output of various classifiers trained using different datasets. Typical methods for multiclassifiers are Bagging. Bagging (Breiman, 1996) was among the first methods proposed for ensemble creation. Given a training set S, it generates M new training sets S 1,...,S M randomly picking elements from S; each new set S i is used to train exactly one classifier. Hence an ensemble of individual classifiers is obtained from M new training sets. Random subspace. In the random subspace method (Houle et al., 1998) each individual classifier uses only a subset of all features for training and testing. Decision rule (Kittler et al., 1998). Several decision rules can be used to determine the final class from an ensemble of classifiers; the most used are Vote rule, Max rule, Min rule, Mean rule, Sum rule. 3. Results and discussion In this section we perform an empirical comparison of several classification methods obtained by coupling different approaches described above for performing the HIV- 1 task. Results are reported only for the combinations which present a high accuracy. Then we discuss the results yielded by the experiments in order to design a hierarchical classifier. The performances are compared using two measures: error rate to evaluate the accuracy and Yule s Q statistic (Yule, 1900) to quantify the independence of classifiers. For two classifiers D i and D k the Q statistic is ad bc Q i;k ¼ ad þ bc where a is the probability of both classifiers being correct, d is the probability of both classifiers being incorrect, b is the

4 1540 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) probability first classifier is correct and second is incorrect, c is the probability second classifier is correct and first is incorrect. Q varies between 1 and 1. For statistically independent classifiers, Q i,k = 0. Classifiers that tend to recognize the same patterns correctly will have Q > 0, and those which commit errors on different patterns will have Q < 0. All the tests have been conducted on the following dataset using a 2-fold cross-validation: HIV data set. The dataset contains 362 octamer protein sequences, each of which needs to be classified as an HIV protease cleavable site or uncleavable site. On this dataset, we performed 10 tests, each time randomly resampling learning, and test sets (containing respectively half of the patterns), but maintaining the distribution of the patterns in the two classes. The results reported refer to the average classification accuracy achieved throughout the 10 experiments Accuracy We report some useful tests on the error rate aimed to compare the quality of various methods in the HIV-1 protease problem. Table 1 lists the tests whose results are reported in Fig. 1, we perform each test using both n-grams and orthonormal encoding as feature extraction. The absence of the feature transformation step indicates Table 1 Tests made for the HIV-1 protease problem Short name Feature transformation Classifier KLDC KL LDC ILDC ICA LDC KeLDC Kernel PCA LDC KeSVM Kernel PCA L-SVM KSVM KL L-SVM ISVM ICA L-SVM L-SVM L-SVM MLP MLP KMLP KL MLP 0.25 that the classification task is performed starting from the original features. The graphs in Figs. 1 and 2 report, respectively, the classification error rates given by various classifiers using two different feature extraction methods and the results obtained using MCSs. We cannot that MCS techniques do not obtain a considerable increase in the performance with respect to a single classifier, this behavior can be explained by the analysis of the error independence among classifiers. As concerns the classifiers used in MCS, we adopt a variable set of classifiers in each experiment chosen in order to maximize the performance. Random subspace is tested using KL as feature transformation (to reduce the features to a 50 dimensional space) and LDC as classifier. Bagging- LDC is tested using KL as feature transformation and LDC as classifier. Bagging-MLP is tested using KL and MLP. The performances reported are the best obtained by varying the number of dimensions retained and of classifiers in random subspaces and the number of classifiers used in bagging. The decision rules are evaluated combining the following five methods described in Table 1: L- SVM, KLDC, KSVM, ILDC, ISVM. These results confirm as already stated that using the orthonormal encoding as feature extractor, the HIV-1 protease cleavage site specificity can be solved efficiently by linear models. The confidence limits of the tests reported in Fig. 1 are approximately ±1.5%. In addition these results prove that, using a KL feature transformation, the performances of non-linear and linear models are similar to each other. We argue that combining a linear transformation and a non-linear classifier can effectively handle this problem, where one class is shift variant while the other is shift invariant. Another interesting result is the low performance of non-linear transformations (KernelPCA and ICA) Independence of classifiers Table 2 and Fig. 3 show some useful tests on error independence between two classifiers. Only the most interesting orthonormal encoding n-grams KLDC ILDC KeLDC KeSVM KSVM ISVM LSVM MLP KMLP Random Subspace Bagging-LDC Bagging-MLP Vote rule Mean Rule Max Rule Min Rule Sum Rule Fig. 1. Error rate for different classifiers built on orthonormal (left) and n-grams (right) feature spaces. Fig. 2. Error rate for different MCSs built on orthonormal encoding feature space.

5 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) Table 2 Tests made to study the error independence Short name Method 1 Method 2 FE FT Classifier FE FT Classifier A OE L-SVM OE KL L-SVM B OE L-SVM OE ICA L-SVM C OE KL L-SVM OE ICA L-SVM D NG L-SVM NG KL L-SVM E NG L-SVM NG ICA L-SVM F NG KL L-SVM NG ICA L-SVM G NG L-SVM OE ICA LDC H NG L-SVM OE KL LDC I OE L-SVM OE KL LDC L OE L-SVM OE ICA LDC M OE L-SVM OE KL MLP Fig. 3. Error independence for different classifiers. combinations (evaluated considering accuracy performance of each classifier) are reported for sake of space Analysis of results From the analysis of these results we can draw the following conclusions: A B C D E F G H I L M The feature space n-grams gains an error independence slightly larger than orthonormal encoding, even if the performance of single classifiers is lower. The best trade-off between accuracy and error independence is given by the combination M. Considering these results it seems to be very improbable to improve the performance by an ensemble of parallel classifiers. Therefore we design a hierarchical classifier which can enhance both the good performance of LDC and MLP. Moreover, given the low error independence of all machine learning methods, we insert in our classifier some rules based on pattern motifs and an edit distance classifier. 4. A new hierarchical classifier There are many ways to build a hierarchical multiclassifier: for example by using each level to distinguish between one class and the others, or using only a subset of the input features at each level, or using at each step a classifier with rejection to classify patterns with high confidence and forwarding rejected patterns to the next level (Giusti et al., 2002). In this work, we develop a hierarchical structure, in which each step is constituted by a module able to classify only a fraction of the patterns: the rejected patterns are given as input to the following steps. The classifier is composed by four steps: Edit distance + cleavage rule, LDC, cleavage/non-cleavage rule, MLP. In Fig. 4 a graphical description of the system proposed is given; in the following each step is described in details. For the determination of the optimal parameters of the algorithm, one third of the patterns of the training set is randomly selected and used as a validation set. step 1 step 2 Orthonormal Encoding Edit Distance Classifier Cleavage Rule KL LDC step 4 MLP step 3 Cleavage/Non Cleavage Rule Rejection SYMBOLS Feature extraction Feature transformation Classifier Sequence arrow Classified patterns Rejected patterns Fig. 4. Hierarchical classifier schema.

6 1542 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) Giusti et al. (2002) proved that the error probability for a hierarchical system, given a rejection threshold at each levels, can be expressed as the sum of the optimal Bayes error and the error rate of each classifier (related to the patterns not rejected). This result means that in principle, the optimal Bayes error can be still obtained even if all the stand-alone classifiers are not optimal Edit distance classifier + cleavage rule The edit distance classifier gives good performance for pattern belonging to the shift invariant class, while it is not reliable when assigns a pattern to the shift variant class. For example given a training set of patterns belonging to both the classes, if a new pattern is near (with respect the edit distance) to a pattern of the uncleaved class (shift invariant) we can reasonable assume that it belongs to the same class, on the contrary if the new pattern is near to a pattern of the cleavage class, we cannot make any assumption with high degree of certainly. Starting from this consideration we design a classifier that assigns to the uncleaved class the patterns classified as uncleavage site by EDC, while rejects the others. The error rate of the EDC, if used without rejection to classify all the patterns, is approximately 84.20%. If we reject all the patterns assigned to the cleaved class, it is able to classify the 62.70% of patterns with an error rate of only 4.4%. A possible method to further reduce the error rate of the edit distance classifier is to reject the patterns, classified as uncleaved by the EDC, that satisfy this rule: (xxx(nyla)xxxx & (!xxxkxxxx j!xx(fkq)xxxxx j!xxxxxcxx j!xxxxxxkx)) The rationale of this rule is From a statistical study on the training set, we have noted that a cleaved pattern with high similarity to an uncleaved one often contains the motif xxx(nyla)xxxx (xxx(nyla)xxxx means that the fourth amino acid must be N, Y, L or A). This rule matches partially with a rule shown in (Tozser et al., 2000). To avoid rejection of many patterns we use some motifs (Rögnvaldsson and You, 2003) that characterize the uncleaved class. These motifs are xxxkxxxx xx(fkq)xxxxx xxxxxcxx xxxxxxkx By coupling these rules to the EDC we reject a further 20.6% of the patterns previously classified: this allows to reduce the error rate to 1.1% on the accepted patterns (which are the 49.83% of the total) Multiclassifier with a modified mean rule We train a LDC classifier using patterns represented by orthonormal encoding as feature extractor and reduced to Table 3 Pairs of amino acids and positions that influence the cleaving decision xxxfxexx xxxyxexx xxxlxexx xxxfxqxx xxvxxexx xxxxpexx xxvfxxxx xxxfpxxx xxixxexx xxxmxexx xxaxxexx xxafxxxx xxxxxexk FxxxxExx Table 4 Single amino acids and positions that influence the cleaving decision xxxkxxxx xxxxxsxx xxxxxkxx xxxpxxxx xxxxcxxx Cxxxxxxx xxyxxxxx a 50 dimensional space by a KL transform. The patterns whose confidence is lower than a prefixed threshold are rejected. The value of the threshold is set experimentally (same value in ALL the tests). In this step the 36.5% of the patterns are classified, with an error rate of 5.75% Cleavage/non-cleavage rule The third step consists in some rules proposed in (Rögnvaldsson and You, 2003): if a pattern contains one of the motifs shown in Table 3, it is classified as cleaved, if it contains one of the motifs shown in Table 4 it is assigned to the uncleaved class, otherwise it is rejected to the next step. Using these rules the 5.64% of the patterns can be classified, with an error rate of 1.96% MLP The patterns rejected by the previous steps are finally classified with MLP classifier based on patterns represented by orthonormal encoding as feature extractor and reduced to a 50 dimensional space by a KL transform. In this last step all the remaining patterns (about the 10.39%) are classified without rejection, with an error rate of 39%. 5. Results of the hierarchical structure Finally, in Fig. 5 we compare the classification error rates obtained by our hierarchical method (HC) and by the systems proposed in (Rögnvaldsson and You, 2003): a MPL coupled with a KL dimensionality reduction (KL + MPL) and a linear-svm classifier (L-SVM). In Table 5 the classification performance of each step of the new hierarchical classifier are summarized: the local error rate is evaluated considering only the patterns effectively classified at each step, while the global error rate is the cumulative error obtained considering all patterns classified till that step; analogously with local classified we mean the percentage of the whole patterns classified at each step, while global classified is the cumulative percentage of classified patterns at each step. It is interesting to note that for HC the 10% of patterns can be considered diffi-

7 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) HC KL+MLP L-SVM HC2 In this section we report some experiments to validate our idea of constructing a hierarchical classifier architecture by merging some ideas derived from the machine learning methodologies and from a knowledge base of this particular problem. A first test has been conducted in order to evaluate the independence between a machine learning classifier as LDC and the set of mining rules based on pattern motifs detailed in Section 4.3. The error independence among these two methods is evaluated by the Q-statistic on the fraction of patterns effectively classified by the rules; the result is 0.9, lower than those obtained for any combination of classifiers reported in Table 2 and Fig. 3. This means that the two approaches, based on different methodologies, are enough different that they can be coupled in a hierarchical classifier. As a further proof we tested a simple two level hierarchical classifier composed only by cleavage/ non-cleavage rule at the first level and KL + LDC at the second level: the error rate of this method was 7.6%, which is higher than HC, but significantly lower than the error rates of all the MCSs built on orthonormal encoding feature space reported in Fig Conclusion cult, since they contribute to generate the higher part of the total error rate. The greater advantage in the use of HC is a very low error rate, with the same rejection rate of a stand-alone L-SVM or a stand alone MLP (as shown in Table 6). That is, if we classify the 89.95% of patterns with higher confidence the Error Rate of L-SVM is 6% while for HC is 3.1% Validation tests Fig. 5. Error rates on HIV-1 dataset. Table 5 Error rate and number of patterns rejected in each step by HC Steps Global error rate (%) Local error rate (%) Global classified (%) 1 EDC CR Local classified (%) Table 6 Classification performance, with a rejection rate, using L-SVM or MLP Method Error rate (%) Global classified (%) L-SVM L-SVM L-SVM MLP MLP MLP The problem addressed in this paper is to recognize, given a sequence of amino acids, HIV-1 protease cleavage site. We showed by an empirical comparison of several classification methods that coupling a linear transform to a non-linear classifier low error rates can be obtained. Moreover we introduced, for the first time, a method that combines ideas derived from the machine learning methodologies and from a knowledge base of this particular problem. Our experiments showed, by means of the Q-statistic, that the combination between a machine learning classifier and rules based on pattern motifs can be interesting. Finally we illustrated how to obtain a composed method with a very low error rate for HIV-1 protease specificity problem, starting from an exhaustive study of several classification methodologies. This approach is very important in the field of bioinformatics where methods taken from machine learning are often applied without a deepened study of the problem. In particular, in this work, we proposed a hierarchical classifier that combines linear and non-linear classifiers and rules based on pattern motifs. Our hierarchical classifier is composed, at each level, by classifiers highly independent each other, taken in order of the best classification rate for the patterns not rejected. The major advantage of the proposed approach is the low error rate, better than other stand-alone methods proposed in the literature. The major disadvantage is that our system cannot help to understand the relationship between the data. References Beck, Z.Q., Hervio, L., Dawson, P.E., Elder, J.E., Madison, E.L., Identification of efficiently cleaved substrates for HIV-1 protease using a phage display library and use in inhibitor development. Virology. Breiman, L., Bagging predictors. Machine Learn., Cai, Y.D., Chou, K.C., Artificial neural network model for predicting HIV protease cleavage sites in protein. Adv. Eng. Software 29, Cai, Y.D., Yu, H., Chou, K.C., Using neural network for prediction of HIV protease cleavage sites in proteins. J. Protein Chem. 17, Chou, J.J, 1993a. A formulation for correlating properties of peptides and its application to predicting human immunodeficiency virus proteasecleavable sites in proteins. Biopolymers 33, Chou, J.J., 1993b. Predicting cleavability of peptide sequences by HIV protease via correlation-angle approach. J. Protein Chem. 12, 291.

8 1544 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) Chou, K.C., 1993c. A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. J. Biol. Chem. 268, Chou, K.C., Review: Prediction of HIV protease cleavage sites in proteins. Anal. Biochem. 233, Chou, K.C., Zhang, C.T., Diagrammatization of codon usage in 339 HIV proteins and its biological implication. AIDS Res. Human Retroviruses 8, Chou, K.C., Zhang, C.T., Kezdy, F.J., A vector approach to predicting HIV protease cleavage sites in proteins. Proteins: Struct., Funct., Genet. 16, Chou, K.C., Tomasselli, A.L., Reardon, I.M., Heinrikson, R.L., Predicting HIV protease cleavage sites in proteins by a discriminant function method. PROTEINS: Struct., Funct., Genet. 24, Dietterich, T.G., Ensemble methods in machine learning. In: Kittler, J., Roli, F. (Eds.), Multiple Classifier Systems. First International Workshop, MCS 2000, Lecture Notes in Computer Science, Springer-Verlag, Cagliari, Italy, pp Duda, R., Hart, P., Stork, D., Pattern Classification. Wiley, New York. Giusti, N., Masulli, F., Sperduti, A., Theoretical and experimental analysis of a two-stage system for classification. IEEE Trans. PAMI 24 (7), Houle, G., Aragon, D., Smith, R., Kimura, D.,1998. A multilayered corroboration-based check reader. In: Hull, J., Taylor, S., (Eds.), Document analysis system Kittler, J., Hatef, M., Duin, R.P.W., Matas, J., On combining classifiers. IEEE Trans. Pattern Anal. Machine Intell. 20 (3). Levenshtein, V.I., Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR 163 (4), Masulli, F., Valentini, G., Comparing decomposition methods for classification. In: Howlett, R.J., Jain, L.C. (Eds.), KES 2000, Fourth International Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies. IEEE, Piscataway, NJ, pp Mayoraz, E., Moreira, M., On the decomposition of polychotomies into dichotomies. In: The XIV International Conference on Machine Learning, , Nashville, TN, July. Narayanan, A., Wu, X., Yang, Z., Mining viral protease data to extract cleavage knowledge. Bioinformatics 18, S5 S13. Rögnvaldsson, T., You, L., Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics, Scholkopf, S., Smola, A., Muller, K.R., Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 (5), Tan, A.C., Gilbert, D., An empirical comparison of supervised machine learning techniques in bioinformatics. In the Proceedings of the First Asia Pacific Bioinformatics Conference (APBC 2003). 19: Thompson, T.B., Chou, K.C., Zheng, C., Neural network prediction of the HIV-1 protease cleavage sites. J. Theoret. Biol. 177, Tozser, J., Zahuczky, G., Bagossi, P., Louis, J., Copeland, T., Oroszlan, S., Harrison, R., Weber, T., Comparison of the substrate specificity of the human T-cell leukemia virus and human immunodeficiency virus proteinases. Eur. J. Biochem. 267, Wu, C.H., Whitson, G., McLarty, J., Ermongkonchai, A., Change, T.C., PROCANS: Protein classification artificial neural system. Protein Sci., Yule, G.U., On the association of attributes in statistics. Philos. Trans., A 194,

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network Contents Classification Rule Generation for Bioinformatics Hyeoncheol Kim Rule Extraction from Neural Networks Algorithm Ex] Promoter Domain Hybrid Model of Knowledge and Learning Knowledge refinement

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Term Project Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is the

More information

Mammogram Analysis: Tumor Classification

Mammogram Analysis: Tumor Classification Mammogram Analysis: Tumor Classification Literature Survey Report Geethapriya Raghavan geeragh@mail.utexas.edu EE 381K - Multidimensional Digital Signal Processing Spring 2005 Abstract Breast cancer is

More information

EECS 433 Statistical Pattern Recognition

EECS 433 Statistical Pattern Recognition EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection

Efficacy of the Extended Principal Orthogonal Decomposition Method on DNA Microarray Data in Cancer Detection 202 4th International onference on Bioinformatics and Biomedical Technology IPBEE vol.29 (202) (202) IASIT Press, Singapore Efficacy of the Extended Principal Orthogonal Decomposition on DA Microarray

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures 1 2 3 4 5 Kathleen T Quach Department of Neuroscience University of California, San Diego

More information

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis Data mining for Obstructive Sleep Apnea Detection 18 October 2017 Konstantinos Nikolaidis Introduction: What is Obstructive Sleep Apnea? Obstructive Sleep Apnea (OSA) is a relatively common sleep disorder

More information

Predicting Breast Cancer Survivability Rates

Predicting Breast Cancer Survivability Rates Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer

More information

ECG Beat Recognition using Principal Components Analysis and Artificial Neural Network

ECG Beat Recognition using Principal Components Analysis and Artificial Neural Network International Journal of Electronics Engineering, 3 (1), 2011, pp. 55 58 ECG Beat Recognition using Principal Components Analysis and Artificial Neural Network Amitabh Sharma 1, and Tanushree Sharma 2

More information

Intelligent Edge Detector Based on Multiple Edge Maps. M. Qasim, W.L. Woon, Z. Aung. Technical Report DNA # May 2012

Intelligent Edge Detector Based on Multiple Edge Maps. M. Qasim, W.L. Woon, Z. Aung. Technical Report DNA # May 2012 Intelligent Edge Detector Based on Multiple Edge Maps M. Qasim, W.L. Woon, Z. Aung Technical Report DNA #2012-10 May 2012 Data & Network Analytics Research Group (DNA) Computing and Information Science

More information

Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information

Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information Breast Cancer Diagnosis using a Hybrid Genetic Algorithm for Feature Selection based on Mutual Information Abeer Alzubaidi abeer.alzubaidi022014@my.ntu.ac.uk David Brown david.brown@ntu.ac.uk Abstract

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science

More information

Sparse Coding in Sparse Winner Networks

Sparse Coding in Sparse Winner Networks Sparse Coding in Sparse Winner Networks Janusz A. Starzyk 1, Yinyin Liu 1, David Vogel 2 1 School of Electrical Engineering & Computer Science Ohio University, Athens, OH 45701 {starzyk, yliu}@bobcat.ent.ohiou.edu

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A Medical Decision Support System based on Genetic Algorithm and Least Square Support Vector Machine for Diabetes Disease Diagnosis

More information

Predicting Breast Cancer Recurrence Using Machine Learning Techniques

Predicting Breast Cancer Recurrence Using Machine Learning Techniques Predicting Breast Cancer Recurrence Using Machine Learning Techniques Umesh D R Department of Computer Science & Engineering PESCE, Mandya, Karnataka, India Dr. B Ramachandra Department of Electrical and

More information

MISSING DATA ESTIMATION FOR CANCER DIAGNOSIS SUPPORT

MISSING DATA ESTIMATION FOR CANCER DIAGNOSIS SUPPORT MISSING DATA ESTIMATION FOR CANCER DIAGNOSIS SUPPORT Witold Jacak (a), Karin Proell (b) (a) Department of Software Engineering Upper Austria University of Applied Sciences Hagenberg, Softwarepark 11, Austria

More information

MACHINE LEARNING BASED APPROACHES FOR PREDICTION OF PARKINSON S DISEASE

MACHINE LEARNING BASED APPROACHES FOR PREDICTION OF PARKINSON S DISEASE Abstract MACHINE LEARNING BASED APPROACHES FOR PREDICTION OF PARKINSON S DISEASE Arvind Kumar Tiwari GGS College of Modern Technology, SAS Nagar, Punjab, India The prediction of Parkinson s disease is

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and

More information

Applied Machine Learning in Biomedicine. Enrico Grisan

Applied Machine Learning in Biomedicine. Enrico Grisan Applied Machine Learning in Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it Algorithm s objective cost Formal objective for algorithms: - minimize a cost function - maximize an objective function

More information

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

A HMM-based Pre-training Approach for Sequential Data

A HMM-based Pre-training Approach for Sequential Data A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University

More information

NMF-Density: NMF-Based Breast Density Classifier

NMF-Density: NMF-Based Breast Density Classifier NMF-Density: NMF-Based Breast Density Classifier Lahouari Ghouti and Abdullah H. Owaidh King Fahd University of Petroleum and Minerals - Department of Information and Computer Science. KFUPM Box 1128.

More information

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition

More information

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH

PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH PREDICTION OF BREAST CANCER USING STACKING ENSEMBLE APPROACH 1 VALLURI RISHIKA, M.TECH COMPUTER SCENCE AND SYSTEMS ENGINEERING, ANDHRA UNIVERSITY 2 A. MARY SOWJANYA, Assistant Professor COMPUTER SCENCE

More information

SVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis

SVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis SVM-Kmeans: Support Vector Machine based on Kmeans Clustering for Breast Cancer Diagnosis Walaa Gad Faculty of Computers and Information Sciences Ain Shams University Cairo, Egypt Email: walaagad [AT]

More information

Gender Based Emotion Recognition using Speech Signals: A Review

Gender Based Emotion Recognition using Speech Signals: A Review 50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department

More information

Variable Features Selection for Classification of Medical Data using SVM

Variable Features Selection for Classification of Medical Data using SVM Variable Features Selection for Classification of Medical Data using SVM Monika Lamba USICT, GGSIPU, Delhi, India ABSTRACT: The parameters selection in support vector machines (SVM), with regards to accuracy

More information

Hybridized KNN and SVM for gene expression data classification

Hybridized KNN and SVM for gene expression data classification Mei, et al, Hybridized KNN and SVM for gene expression data classification Hybridized KNN and SVM for gene expression data classification Zhen Mei, Qi Shen *, Baoxian Ye Chemistry Department, Zhengzhou

More information

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence To understand the network paradigm also requires examining the history

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction Artificial neural networks are mathematical inventions inspired by observations made in the study of biological systems, though loosely based on the actual biology. An artificial

More information

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some

More information

Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks

Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks Recognition of HIV-1 subtypes and antiretroviral drug resistance using weightless neural networks Caio R. Souza 1, Flavio F. Nobre 1, Priscila V.M. Lima 2, Robson M. Silva 2, Rodrigo M. Brindeiro 3, Felipe

More information

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features

Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine based on Analysis of Variance Features American Journal of Applied Sciences 8 (12): 1295-1301, 2011 ISSN 1546-9239 2011 Science Publications Efficient Classification of Cancer using Support Vector Machines and Modified Extreme Learning Machine

More information

1 Pattern Recognition 2 1

1 Pattern Recognition 2 1 1 Pattern Recognition 2 1 3 Perceptrons by M.L. Minsky and S.A. Papert (1969) Books: 4 Pattern Recognition, fourth Edition (Hardcover) by Sergios Theodoridis, Konstantinos Koutroumbas Publisher: Academic

More information

An Improved Algorithm To Predict Recurrence Of Breast Cancer

An Improved Algorithm To Predict Recurrence Of Breast Cancer An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant

More information

Nearest Shrunken Centroid as Feature Selection of Microarray Data

Nearest Shrunken Centroid as Feature Selection of Microarray Data Nearest Shrunken Centroid as Feature Selection of Microarray Data Myungsook Klassen Computer Science Department, California Lutheran University 60 West Olsen Rd, Thousand Oaks, CA 91360 mklassen@clunet.edu

More information

University of East London Institutional Repository:

University of East London Institutional Repository: University of East London Institutional Repository: http://roar.uel.ac.uk This paper is made available online in accordance with publisher policies. Please scroll down to view the document itself. Please

More information

A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification

A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification A Hierarchical Artificial Neural Network Model for Giemsa-Stained Human Chromosome Classification JONGMAN CHO 1 1 Department of Biomedical Engineering, Inje University, Gimhae, 621-749, KOREA minerva@ieeeorg

More information

196 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 3, SEPTEMBER 2010

196 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 3, SEPTEMBER 2010 196 IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, VOL. 2, NO. 3, SEPTEMBER 2010 Top Down Gaze Movement Control in Target Search Using Population Cell Coding of Visual Context Jun Miao, Member, IEEE,

More information

EEG signal classification using Bayes and Naïve Bayes Classifiers and extracted features of Continuous Wavelet Transform

EEG signal classification using Bayes and Naïve Bayes Classifiers and extracted features of Continuous Wavelet Transform EEG signal classification using Bayes and Naïve Bayes Classifiers and extracted features of Continuous Wavelet Transform Reza Yaghoobi Karimoi*, Mohammad Ali Khalilzadeh, Ali Akbar Hossinezadeh, Azra Yaghoobi

More information

Auto-Encoder Pre-Training of Segmented-Memory Recurrent Neural Networks

Auto-Encoder Pre-Training of Segmented-Memory Recurrent Neural Networks Auto-Encoder Pre-Training of Segmented-Memory Recurrent Neural Networks Stefan Glüge, Ronald Böck and Andreas Wendemuth Faculty of Electrical Engineering and Information Technology Cognitive Systems Group,

More information

J2.6 Imputation of missing data with nonlinear relationships

J2.6 Imputation of missing data with nonlinear relationships Sixth Conference on Artificial Intelligence Applications to Environmental Science 88th AMS Annual Meeting, New Orleans, LA 20-24 January 2008 J2.6 Imputation of missing with nonlinear relationships Michael

More information

Evolutionary Programming

Evolutionary Programming Evolutionary Programming Searching Problem Spaces William Power April 24, 2016 1 Evolutionary Programming Can we solve problems by mi:micing the evolutionary process? Evolutionary programming is a methodology

More information

Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine

Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine Design of Multi-Class Classifier for Prediction of Diabetes using Linear Support Vector Machine Akshay Joshi Anum Khan Omkar Kulkarni Department of Computer Engineering Department of Computer Engineering

More information

Data complexity measures for analyzing the effect of SMOTE over microarrays

Data complexity measures for analyzing the effect of SMOTE over microarrays ESANN 216 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 27-29 April 216, i6doc.com publ., ISBN 978-2878727-8. Data complexity

More information

Classification of Smoking Status: The Case of Turkey

Classification of Smoking Status: The Case of Turkey Classification of Smoking Status: The Case of Turkey Zeynep D. U. Durmuşoğlu Department of Industrial Engineering Gaziantep University Gaziantep, Turkey unutmaz@gantep.edu.tr Pınar Kocabey Çiftçi Department

More information

Question 1 Multiple Choice (8 marks)

Question 1 Multiple Choice (8 marks) Philadelphia University Student Name: Faculty of Engineering Student Number: Dept. of Computer Engineering First Exam, First Semester: 2015/2016 Course Title: Neural Networks and Fuzzy Logic Date: 19/11/2015

More information

When Overlapping Unexpectedly Alters the Class Imbalance Effects

When Overlapping Unexpectedly Alters the Class Imbalance Effects When Overlapping Unexpectedly Alters the Class Imbalance Effects V. García 1,2, R.A. Mollineda 2,J.S.Sánchez 2,R.Alejo 1,2, and J.M. Sotoca 2 1 Lab. Reconocimiento de Patrones, Instituto Tecnológico de

More information

Learning Classifier Systems (LCS/XCSF)

Learning Classifier Systems (LCS/XCSF) Context-Dependent Predictions and Cognitive Arm Control with XCSF Learning Classifier Systems (LCS/XCSF) Laurentius Florentin Gruber Seminar aus Künstlicher Intelligenz WS 2015/16 Professor Johannes Fürnkranz

More information

ADAPTING COPYCAT TO CONTEXT-DEPENDENT VISUAL OBJECT RECOGNITION

ADAPTING COPYCAT TO CONTEXT-DEPENDENT VISUAL OBJECT RECOGNITION ADAPTING COPYCAT TO CONTEXT-DEPENDENT VISUAL OBJECT RECOGNITION SCOTT BOLLAND Department of Computer Science and Electrical Engineering The University of Queensland Brisbane, Queensland 4072 Australia

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

The Handling of Disjunctive Fuzzy Information via Neural Network Approach

The Handling of Disjunctive Fuzzy Information via Neural Network Approach From: AAAI Technical Report FS-97-04. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. The Handling of Disjunctive Fuzzy Information via Neural Network Approach Hahn-Ming Lee, Kuo-Hsiu

More information

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods

Diagnosis of Breast Cancer Using Ensemble of Data Mining Classification Methods International Journal of Bioinformatics and Biomedical Engineering Vol. 1, No. 3, 2015, pp. 318-322 http://www.aiscience.org/journal/ijbbe ISSN: 2381-7399 (Print); ISSN: 2381-7402 (Online) Diagnosis of

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

arxiv: v2 [cs.lg] 1 Jun 2018

arxiv: v2 [cs.lg] 1 Jun 2018 Shagun Sodhani 1 * Vardaan Pahuja 1 * arxiv:1805.11016v2 [cs.lg] 1 Jun 2018 Abstract Self-play (Sukhbaatar et al., 2017) is an unsupervised training procedure which enables the reinforcement learning agents

More information

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network UTM Computing Proceedings Innovations in Computing Technology and Applications Volume 2 Year: 2017 ISBN: 978-967-0194-95-0 1 Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon

More information

COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks

COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks Textbook, Sections 6.2.2, 6.3, 7.9, 7.11-7.13, 9.1-9.5 COMP9444 17s2 Convolutional Networks 1 Outline Geometry of Hidden Unit Activations

More information

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT Research Article Bioinformatics International Journal of Pharma and Bio Sciences ISSN 0975-6299 A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS D.UDHAYAKUMARAPANDIAN

More information

Facial Expression Recognition Using Principal Component Analysis

Facial Expression Recognition Using Principal Component Analysis Facial Expression Recognition Using Principal Component Analysis Ajit P. Gosavi, S. R. Khot Abstract Expression detection is useful as a non-invasive method of lie detection and behaviour prediction. However,

More information

Improving the Accuracy of Neuro-Symbolic Rules with Case-Based Reasoning

Improving the Accuracy of Neuro-Symbolic Rules with Case-Based Reasoning Improving the Accuracy of Neuro-Symbolic Rules with Case-Based Reasoning Jim Prentzas 1, Ioannis Hatzilygeroudis 2 and Othon Michail 2 Abstract. In this paper, we present an improved approach integrating

More information

ERA: Architectures for Inference

ERA: Architectures for Inference ERA: Architectures for Inference Dan Hammerstrom Electrical And Computer Engineering 7/28/09 1 Intelligent Computing In spite of the transistor bounty of Moore s law, there is a large class of problems

More information

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression

More information

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION

COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION COMPARATIVE STUDY ON FEATURE EXTRACTION METHOD FOR BREAST CANCER CLASSIFICATION 1 R.NITHYA, 2 B.SANTHI 1 Asstt Prof., School of Computing, SASTRA University, Thanjavur, Tamilnadu, India-613402 2 Prof.,

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Computational Cognitive Neuroscience

Computational Cognitive Neuroscience Computational Cognitive Neuroscience Computational Cognitive Neuroscience Computational Cognitive Neuroscience *Computer vision, *Pattern recognition, *Classification, *Picking the relevant information

More information

Grounding Ontologies in the External World

Grounding Ontologies in the External World Grounding Ontologies in the External World Antonio CHELLA University of Palermo and ICAR-CNR, Palermo antonio.chella@unipa.it Abstract. The paper discusses a case study of grounding an ontology in the

More information

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Predicting Breast Cancer Survival Using Treatment and Patient Factors Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women

More information

Robust system for patient specific classification of ECG signal using PCA and Neural Network

Robust system for patient specific classification of ECG signal using PCA and Neural Network International Research Journal of Engineering and Technology (IRJET) e-issn: 395-56 Volume: 4 Issue: 9 Sep -7 www.irjet.net p-issn: 395-7 Robust system for patient specific classification of using PCA

More information

Two lectures on autism

Two lectures on autism Lennart Gustafsson August 18, 2004 Two lectures on autism This first lecture contains some of the material that has been published in 1. Gustafsson L. Inadequate cortical feature maps: a neural circuit

More information

Applying Data Mining for Epileptic Seizure Detection

Applying Data Mining for Epileptic Seizure Detection Applying Data Mining for Epileptic Seizure Detection Ying-Fang Lai 1 and Hsiu-Sen Chiang 2* 1 Department of Industrial Education, National Taiwan Normal University 162, Heping East Road Sec 1, Taipei,

More information

Empirical Mode Decomposition based Feature Extraction Method for the Classification of EEG Signal

Empirical Mode Decomposition based Feature Extraction Method for the Classification of EEG Signal Empirical Mode Decomposition based Feature Extraction Method for the Classification of EEG Signal Anant kulkarni MTech Communication Engineering Vellore Institute of Technology Chennai, India anant8778@gmail.com

More information

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Data Mining Techniques to Predict Cancer Diseases

More information

Type II Fuzzy Possibilistic C-Mean Clustering

Type II Fuzzy Possibilistic C-Mean Clustering IFSA-EUSFLAT Type II Fuzzy Possibilistic C-Mean Clustering M.H. Fazel Zarandi, M. Zarinbal, I.B. Turksen, Department of Industrial Engineering, Amirkabir University of Technology, P.O. Box -, Tehran, Iran

More information

Wavelet Neural Network for Classification of Bundle Branch Blocks

Wavelet Neural Network for Classification of Bundle Branch Blocks , July 6-8, 2011, London, U.K. Wavelet Neural Network for Classification of Bundle Branch Blocks Rahime Ceylan, Yüksel Özbay Abstract Bundle branch blocks are very important for the heart treatment immediately.

More information

BREAST CANCER EPIDEMIOLOGY MODEL:

BREAST CANCER EPIDEMIOLOGY MODEL: BREAST CANCER EPIDEMIOLOGY MODEL: Calibrating Simulations via Optimization Michael C. Ferris, Geng Deng, Dennis G. Fryback, Vipat Kuruchittham University of Wisconsin 1 University of Wisconsin Breast Cancer

More information

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Ruifeng Xu, Chengtian Zou, Jun Xu Key Laboratory of Network Oriented Intelligent Computation, Shenzhen Graduate School,

More information

A scored AUC Metric for Classifier Evaluation and Selection

A scored AUC Metric for Classifier Evaluation and Selection A scored AUC Metric for Classifier Evaluation and Selection Shaomin Wu SHAOMIN.WU@READING.AC.UK School of Construction Management and Engineering, The University of Reading, Reading RG6 6AW, UK Peter Flach

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 1.852

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 1.852 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Performance Analysis of Brain MRI Using Multiple Method Shroti Paliwal *, Prof. Sanjay Chouhan * Department of Electronics & Communication

More information

Efficient Classification of Lung Tumor using Neural Classifier

Efficient Classification of Lung Tumor using Neural Classifier Efficient Classification of Lung Tumor using Neural Classifier Mohd.Shoeb Shiraj 1, Vijay L. Agrawal 2 PG Student, Dept. of EnTC, HVPM S College of Engineering and Technology Amravati, India Associate

More information

AQCHANALYTICAL TUTORIAL ARTICLE. Classification in Karyometry HISTOPATHOLOGY. Performance Testing and Prediction Error

AQCHANALYTICAL TUTORIAL ARTICLE. Classification in Karyometry HISTOPATHOLOGY. Performance Testing and Prediction Error AND QUANTITATIVE CYTOPATHOLOGY AND AQCHANALYTICAL HISTOPATHOLOGY An Official Periodical of The International Academy of Cytology and the Italian Group of Uropathology Classification in Karyometry Performance

More information

CARDIAC ARRYTHMIA CLASSIFICATION BY NEURONAL NETWORKS (MLP)

CARDIAC ARRYTHMIA CLASSIFICATION BY NEURONAL NETWORKS (MLP) CARDIAC ARRYTHMIA CLASSIFICATION BY NEURONAL NETWORKS (MLP) Bochra TRIQUI, Abdelkader BENYETTOU Center for Artificial Intelligent USTO-MB University Algeria triqui_bouchra@yahoo.fr a_benyettou@yahoo.fr

More information

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017

Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 Machine Learning! Robert Stengel! Robotics and Intelligent Systems MAE 345,! Princeton University, 2017 A.K.A. Artificial Intelligence Unsupervised learning! Cluster analysis Patterns, Clumps, and Joining

More information

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention Tapani Raiko and Harri Valpola School of Science and Technology Aalto University (formerly Helsinki University of

More information

Deep learning and non-negative matrix factorization in recognition of mammograms

Deep learning and non-negative matrix factorization in recognition of mammograms Deep learning and non-negative matrix factorization in recognition of mammograms Bartosz Swiderski Faculty of Applied Informatics and Mathematics Warsaw University of Life Sciences, Warsaw, Poland bartosz_swiderski@sggw.pl

More information

DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS

DIABETIC RISK PREDICTION FOR WOMEN USING BOOTSTRAP AGGREGATION ON BACK-PROPAGATION NEURAL NETWORKS International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 4, July-Aug 2018, pp. 196-201, Article IJCET_09_04_021 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=9&itype=4

More information

Predicting Human Immunodeficiency Virus Type 1 Drug Resistance From Genotype Using Machine Learning. Robert James Murray

Predicting Human Immunodeficiency Virus Type 1 Drug Resistance From Genotype Using Machine Learning. Robert James Murray Predicting Human Immunodeficiency Virus Type 1 Drug Resistance From Genotype Using Machine Learning. Robert James Murray Master of Science School of Informatics University Of Edinburgh 2004 ABSTRACT: Drug

More information

Analysis of Resistance to Human Immunodeficiency Virus Protease Inhibitors Using Molecular Mechanics and Machine Learning Strategies

Analysis of Resistance to Human Immunodeficiency Virus Protease Inhibitors Using Molecular Mechanics and Machine Learning Strategies American Medical Journal 1 (2): 126-132, 2010 ISSN 1949-0070 2010 Science Publications Analysis of Resistance to Human Immunodeficiency Virus Protease Inhibitors Using Molecular Mechanics and Machine Learning

More information

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System

Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System Lung Cancer Diagnosis from CT Images Using Fuzzy Inference System T.Manikandan 1, Dr. N. Bharathi 2 1 Associate Professor, Rajalakshmi Engineering College, Chennai-602 105 2 Professor, Velammal Engineering

More information

Improved Intelligent Classification Technique Based On Support Vector Machines

Improved Intelligent Classification Technique Based On Support Vector Machines Improved Intelligent Classification Technique Based On Support Vector Machines V.Vani Asst.Professor,Department of Computer Science,JJ College of Arts and Science,Pudukkottai. Abstract:An abnormal growth

More information

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction

A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction A Novel Iterative Linear Regression Perceptron Classifier for Breast Cancer Prediction Samuel Giftson Durai Research Scholar, Dept. of CS Bishop Heber College Trichy-17, India S. Hari Ganesh, PhD Assistant

More information

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset.

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset. Volume 7, Issue 3, March 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Medoid Based Approach

More information

Modelling and Application of Logistic Regression and Artificial Neural Networks Models

Modelling and Application of Logistic Regression and Artificial Neural Networks Models Modelling and Application of Logistic Regression and Artificial Neural Networks Models Norhazlina Suhaimi a, Adriana Ismail b, Nurul Adyani Ghazali c a,c School of Ocean Engineering, Universiti Malaysia

More information

Development of Soft-Computing techniques capable of diagnosing Alzheimer s Disease in its pre-clinical stage combining MRI and FDG-PET images.

Development of Soft-Computing techniques capable of diagnosing Alzheimer s Disease in its pre-clinical stage combining MRI and FDG-PET images. Development of Soft-Computing techniques capable of diagnosing Alzheimer s Disease in its pre-clinical stage combining MRI and FDG-PET images. Olga Valenzuela, Francisco Ortuño, Belen San-Roman, Victor

More information

Reactive agents and perceptual ambiguity

Reactive agents and perceptual ambiguity Major theme: Robotic and computational models of interaction and cognition Reactive agents and perceptual ambiguity Michel van Dartel and Eric Postma IKAT, Universiteit Maastricht Abstract Situated and

More information