Machine learning for HIV-1 protease cleavage site prediction

Size: px

Start display at page:

Download "Machine learning for HIV-1 protease cleavage site prediction"

Shanon Warren
6 years ago
Views:

1 Pattern Recognition Letters 27 (2006) Machine learning for HIV-1 protease cleavage site prediction Alessandra Lumini, Loris Nanni * DEIS, IEIIT CNR, Università di Bologna, Viale Risorgimento 2, Bologna, Italy Received 17 November 2004; received in revised form 16 May 2005 Available online 2 May 2006 Communicated by L. Goldfarb Abstract Recently, several works have approached the HIV-1 protease specificity problem by applying a number of classifier creation and combination methods, known as ensemble methods, from the field of machine learning. However, it is still difficult for researchers to choose the best method due to the lack of an effective comparison. For the first time we have made an extensive study on methods for feature extraction, feature transformation and multiclassifier systems (MCS) in the problem of HIV-1 protease. In this work we report an experimental comparison on several learning systems coupled with different feature representations. We confirm previous results stating that linear classifiers obtain higher performance than non-linear classifiers using orthonormal encoding, but we also show that using Karhunen Loeve transform the performance of neural networks are comparable to one of linear support vector machines. Finally we propose a new hierarchical approach that, for the first time, combines ideas derived from the machine learning methodologies and from a knowledge base of this particular problem. This approach proves to be a successful attempt to obtain a drastically error reduction with respect to the performance of linear classifiers: the error rate decreases from 9.1% using linear-svm to 6.6% using our new hierarchical classifier based on some pattern rules. Ó 2006 Elsevier B.V. All rights reserved. Keywords: HIV-1 protease; Karhunen Loeve transform; Hierarchical approach 1. Introduction HIV-1 protease (Beck et al., 2000) is an enzyme in the AIDS virus that is essential to its replication. The chemical action of the protease takes place at a localized active site on its surface. HIV-1 protease inhibitor drugs are small molecules that bind to the active site in HIV-1 protease and stay there, so that the normal functioning of the enzyme is prevented. Understanding and predicting HIV-1 protease cleavage sites in proteins is a very important topic, since cleaved substrates are also templates for synthesis of tightly binding chemically modified inhibitors. The standard paradigm * Corresponding author. Fax: addresses: alumini@deis.unibo.it (A. Lumini), lnanni@deis. unibo.it (L. Nanni). for protease peptide interaction is the lock and key model. In this model a sequence of amino acids fits as a key to the active site in the protease, which is eight-residues long in the HIV-1 protease case. In order to design effective HIV protease inhibitors, accurately identifying cleaved peptide of eight residues is very crucial. The potential number of solutions is 20 8 as there are 20 amino acids. This makes an accurate and rapid method for predicting HIV protease (Chou, 1993a,b,c; Cai and Chou, 1998; Chou et al., 1993, 1996; Chou and Zhang, 1992) very helpful, since an exhaustive experimental search is impossible. The interested reader can see (Chou, 1996) for a good review. In order to approach this problem it is important to know that in HIV-1 protease only one class (the uncleaved category) is shift invariant, the other class is not. Shift invariance means that a category remains unchanged if a /$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi: /j.patrec

2 1538 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) pattern is shifted left or right of one position. For instance, the peptide DDFGRCELAAAMKRHGLHL is not cleaved by HIV-1 protease, which means, due to the shift invariance, that all the octamers DDFGRCEL,..., MKRHGLHL belong to the same uncleaved category. On the contrary, the cleaved category is not shift invariant, because the cleaving occurs at one specificity site and not in nearby sites. A machine learning algorithm is one that can learn from experience (observed examples) with respect to some class of tasks and a performance measure. Machine learning methods are suitable for molecular biology data due to the learning algorithm s ability to construct classifiers/hypotheses that can explain complex relationships in the data. Recently, several works have approached the HIV-1 protease specificity problem by applying techniques from machine learning. In (Cai et al., 1998; Thompson et al., 1995) the authors used a standard feedforward multilayer perceptron (MLP) to solve this problem, achieving an error rate of 12%. In (Cai and Chou, 1998) the authors confirm the result of (Narayanan et al., 2002; Thompson et al., 1995) using the same data and the same MLP architecture, showing that a decision tree was not able to predict the cleavage as well as MLP. Recently in (Cai et al., 1998) Support Vector Machines (SVM) have been adopted to predict the cleavage. In (Rögnvaldsson and You, 2003) the authors showed that HIV-1 protease cleavage is a linear problem and that the best classifier for this problem is linear-svm (L-SVM). Multiclassifier systems (Dietterich, 2000; Masulli and Valentini, 2000; Mayoraz and Moreira, 1997) integrate several data-driven models for the same problem; with the aim of obtaining a better composite global model, with more accurate and reliable estimates. In addition, modular approaches often decompose a complex problem into sub-problems for which the solutions obtained are simpler to understand, as well as to implement, manage and update. Some works combined the output of various classifiers in bioinformatics problems: in (Tan and Gilbert, 2003) the authors compared three ensemble methods (stacking, bagging and boosting) showing that combined methods perform better than the individual learners. In this paper we confute these results: we believe that the behavior exposed in (Tan and Gilbert, 2003) was due to the low performance of single classifiers adopted (even if they used a larger training set, obtained by a ten fold cross-validation). In this work we first perform a comparison of several machine learning approaches applied to the problem of HIV-1 protease, then we show how to develop a new hierarchical classifier (HC) architecture by merging some ideas derived from the study of machine learning methodologies and from a knowledge base of this particular problem. The experiments show that the HIV-1 problem can be effectively solved using our hierarchical classifier: this approach (HC) yields an error rate of 6.6%, which is very lower than the best previous approaches (9.1% using linear-svm). Even if some of the rules used (Rögnvaldsson and You, 2003) were found by looking at all the patterns of the dataset, so the performance of HC is partially biased, in our opinion it is very interesting to show that merging some ideas derived from the study of machine learning methodologies and from a knowledge base of this particular problem the error rate is drastically reduced. In fact, using Q-statistic we show that the independence between a machine learning classifier as LDC and a set of mining rules based on pattern motifs is lower than those obtained for any combination of machine learning classifiers. This means that the two approaches, based on different methodologies, are enough different that they can be coupled to improve the accuracy. 2. Methods In this section a brief description of the feature extraction methodologies, feature transformations, classifiers and ensemble methods combined and tested in this work is given Feature extraction (FE) Feature extraction is a process that extracts a set of features from the original pattern representation through some functional mapping. The most used features in this field are Peptide sequences. A protein sequence is made from combinations of variable length of 20 amino acids P ={A,C,D,...,V, W, Y}. A peptide (small protein) is denoted by P ¼ P 4 P 3 P 2 P 1 1 0P 2 0P 3 0P 4 0. Where P i is an amino acid belonging to P. The scissile bond is located between positions P 1 and P 1 0.Asin(Rögnvaldsson and You, 2003) P 3 and P 4 0 are not used. Orthonormal encoding (OE). It is the standard procedure (Rögnvaldsson and You, 2003) to map the sequence P to a sparse orthonormal representation. Each amino acid P i is then represented by a 20 bit vector with 19 bits set to zero and one bit set to one, and each amino acid vector is orthogonal to all other amino acid vectors. P i can take on any one of the twenty amino acid values. n-grams (NG). The n-grams or k-tuples (Wu et al., 1992) are a pair of values (v i,c i ), where v i is the feature and c i is the counts of this feature in a protein sequence for i =1,...,20 n. These features are all the possible combinations of n letters from the set P. The 6-letter exchange group is another commonly used piece of information. The 6-letter group contains six combinations of the letters. These combinations are A = {H, R, K}, B = {D, E, N,Q}, C = {C}, D = {S,T, P,A,G}, E = {M,I,L,V} and F = {F,Y, W}. Each set of n-grams features from a protein sequence can be scaled using: x x ¼ L n 1

3 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) where x represents the count of a generic gram feature, L is the length of the protein sequence and n is the size of n-gram features. insertion or deletion of a letter. The edit distance is coupled with a nearest-neighbor classifier in order to classify a new pattern Feature transformation (FT) Feature transformation is a process through which a new set of features is created from an existing one represented in a vector space R N. Karhunen Loeve transform (KL) (Duda et al., 2001). This transform projects high dimensional data onto a lower-dimensional subspace in a way that is optimal in a sum-squared sense. It has known that KL is the best linear transform for dimensionality reduction. In this paper, we use KL to reduce the original dataset to 50 dimensions. Independent component analysis (ICA) (Duda et al., 2001). This transform seeks the directions in feature space that show the independence of signals. KernelPCA (Scholkopf et al., 1998). Each feature vector is first projected from the input space to a high dimensional feature space by a non-linear map, then a pattern in the high dimensional space is reduced to a lower-dimensionality by KL. This transform is useful when the feature space is non-linear Classifiers A classifier is a component that uses the feature vector provided by the feature extraction or transformation to assign a pattern to a class. Linear discriminant classifier (LDC) (Duda et al., 2001). The linear discriminant analysis method consists of searching some linear combinations of selected variables, which provide the best separation between the considered classes. Linear-SVM (L-SVM) (Duda et al., 2001). The goal of this two-class classifier is to establish the equation of a hyperplane that divides the training set leaving all the points of the same class on the same side, while maximizing the distance between the two classes and the hyperplane. Multilayer perceptrons (MLP) (Duda et al., 2001). Multilayer perceptrons are supervised feedforward neural networks trained with the standard back-propagation algorithm. With one or two hidden layers, they can approximate virtually any input output map, so they are widely used for pattern classification. Edit distance classifier (EDC) (Levenshtein, 1965). The edit distance of two strings, s1 and s2, is defined as the minimum number of point mutations required to change s1 into s2, where a point mutation is a change, 2.4. Pattern motifs Studying the peptide sequence we cannot that particular combinations of amino acids influence the cleaving/noncleaving decision. (i.e. if the third amino acid of the peptide sequences is Glutamine we know that there is a low possibility that this peptide is a cleavage site). Pattern motifs (Rögnvaldsson and You, 2003) can be used for creating a rule-based classifier Multiclassifier systems (MCS) Multiclassifier systems combine different approaches to solve the same problem. They combine, by a decision rule, output of various classifiers trained using different datasets. Typical methods for multiclassifiers are Bagging. Bagging (Breiman, 1996) was among the first methods proposed for ensemble creation. Given a training set S, it generates M new training sets S 1,...,S M randomly picking elements from S; each new set S i is used to train exactly one classifier. Hence an ensemble of individual classifiers is obtained from M new training sets. Random subspace. In the random subspace method (Houle et al., 1998) each individual classifier uses only a subset of all features for training and testing. Decision rule (Kittler et al., 1998). Several decision rules can be used to determine the final class from an ensemble of classifiers; the most used are Vote rule, Max rule, Min rule, Mean rule, Sum rule. 3. Results and discussion In this section we perform an empirical comparison of several classification methods obtained by coupling different approaches described above for performing the HIV- 1 task. Results are reported only for the combinations which present a high accuracy. Then we discuss the results yielded by the experiments in order to design a hierarchical classifier. The performances are compared using two measures: error rate to evaluate the accuracy and Yule s Q statistic (Yule, 1900) to quantify the independence of classifiers. For two classifiers D i and D k the Q statistic is ad bc Q i;k ¼ ad þ bc where a is the probability of both classifiers being correct, d is the probability of both classifiers being incorrect, b is the

4 1540 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) probability first classifier is correct and second is incorrect, c is the probability second classifier is correct and first is incorrect. Q varies between 1 and 1. For statistically independent classifiers, Q i,k = 0. Classifiers that tend to recognize the same patterns correctly will have Q > 0, and those which commit errors on different patterns will have Q < 0. All the tests have been conducted on the following dataset using a 2-fold cross-validation: HIV data set. The dataset contains 362 octamer protein sequences, each of which needs to be classified as an HIV protease cleavable site or uncleavable site. On this dataset, we performed 10 tests, each time randomly resampling learning, and test sets (containing respectively half of the patterns), but maintaining the distribution of the patterns in the two classes. The results reported refer to the average classification accuracy achieved throughout the 10 experiments Accuracy We report some useful tests on the error rate aimed to compare the quality of various methods in the HIV-1 protease problem. Table 1 lists the tests whose results are reported in Fig. 1, we perform each test using both n-grams and orthonormal encoding as feature extraction. The absence of the feature transformation step indicates Table 1 Tests made for the HIV-1 protease problem Short name Feature transformation Classifier KLDC KL LDC ILDC ICA LDC KeLDC Kernel PCA LDC KeSVM Kernel PCA L-SVM KSVM KL L-SVM ISVM ICA L-SVM L-SVM L-SVM MLP MLP KMLP KL MLP 0.25 that the classification task is performed starting from the original features. The graphs in Figs. 1 and 2 report, respectively, the classification error rates given by various classifiers using two different feature extraction methods and the results obtained using MCSs. We cannot that MCS techniques do not obtain a considerable increase in the performance with respect to a single classifier, this behavior can be explained by the analysis of the error independence among classifiers. As concerns the classifiers used in MCS, we adopt a variable set of classifiers in each experiment chosen in order to maximize the performance. Random subspace is tested using KL as feature transformation (to reduce the features to a 50 dimensional space) and LDC as classifier. Bagging- LDC is tested using KL as feature transformation and LDC as classifier. Bagging-MLP is tested using KL and MLP. The performances reported are the best obtained by varying the number of dimensions retained and of classifiers in random subspaces and the number of classifiers used in bagging. The decision rules are evaluated combining the following five methods described in Table 1: L- SVM, KLDC, KSVM, ILDC, ISVM. These results confirm as already stated that using the orthonormal encoding as feature extractor, the HIV-1 protease cleavage site specificity can be solved efficiently by linear models. The confidence limits of the tests reported in Fig. 1 are approximately ±1.5%. In addition these results prove that, using a KL feature transformation, the performances of non-linear and linear models are similar to each other. We argue that combining a linear transformation and a non-linear classifier can effectively handle this problem, where one class is shift variant while the other is shift invariant. Another interesting result is the low performance of non-linear transformations (KernelPCA and ICA) Independence of classifiers Table 2 and Fig. 3 show some useful tests on error independence between two classifiers. Only the most interesting orthonormal encoding n-grams KLDC ILDC KeLDC KeSVM KSVM ISVM LSVM MLP KMLP Random Subspace Bagging-LDC Bagging-MLP Vote rule Mean Rule Max Rule Min Rule Sum Rule Fig. 1. Error rate for different classifiers built on orthonormal (left) and n-grams (right) feature spaces. Fig. 2. Error rate for different MCSs built on orthonormal encoding feature space.

5 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) Table 2 Tests made to study the error independence Short name Method 1 Method 2 FE FT Classifier FE FT Classifier A OE L-SVM OE KL L-SVM B OE L-SVM OE ICA L-SVM C OE KL L-SVM OE ICA L-SVM D NG L-SVM NG KL L-SVM E NG L-SVM NG ICA L-SVM F NG KL L-SVM NG ICA L-SVM G NG L-SVM OE ICA LDC H NG L-SVM OE KL LDC I OE L-SVM OE KL LDC L OE L-SVM OE ICA LDC M OE L-SVM OE KL MLP Fig. 3. Error independence for different classifiers. combinations (evaluated considering accuracy performance of each classifier) are reported for sake of space Analysis of results From the analysis of these results we can draw the following conclusions: A B C D E F G H I L M The feature space n-grams gains an error independence slightly larger than orthonormal encoding, even if the performance of single classifiers is lower. The best trade-off between accuracy and error independence is given by the combination M. Considering these results it seems to be very improbable to improve the performance by an ensemble of parallel classifiers. Therefore we design a hierarchical classifier which can enhance both the good performance of LDC and MLP. Moreover, given the low error independence of all machine learning methods, we insert in our classifier some rules based on pattern motifs and an edit distance classifier. 4. A new hierarchical classifier There are many ways to build a hierarchical multiclassifier: for example by using each level to distinguish between one class and the others, or using only a subset of the input features at each level, or using at each step a classifier with rejection to classify patterns with high confidence and forwarding rejected patterns to the next level (Giusti et al., 2002). In this work, we develop a hierarchical structure, in which each step is constituted by a module able to classify only a fraction of the patterns: the rejected patterns are given as input to the following steps. The classifier is composed by four steps: Edit distance + cleavage rule, LDC, cleavage/non-cleavage rule, MLP. In Fig. 4 a graphical description of the system proposed is given; in the following each step is described in details. For the determination of the optimal parameters of the algorithm, one third of the patterns of the training set is randomly selected and used as a validation set. step 1 step 2 Orthonormal Encoding Edit Distance Classifier Cleavage Rule KL LDC step 4 MLP step 3 Cleavage/Non Cleavage Rule Rejection SYMBOLS Feature extraction Feature transformation Classifier Sequence arrow Classified patterns Rejected patterns Fig. 4. Hierarchical classifier schema.

6 1542 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) Giusti et al. (2002) proved that the error probability for a hierarchical system, given a rejection threshold at each levels, can be expressed as the sum of the optimal Bayes error and the error rate of each classifier (related to the patterns not rejected). This result means that in principle, the optimal Bayes error can be still obtained even if all the stand-alone classifiers are not optimal Edit distance classifier + cleavage rule The edit distance classifier gives good performance for pattern belonging to the shift invariant class, while it is not reliable when assigns a pattern to the shift variant class. For example given a training set of patterns belonging to both the classes, if a new pattern is near (with respect the edit distance) to a pattern of the uncleaved class (shift invariant) we can reasonable assume that it belongs to the same class, on the contrary if the new pattern is near to a pattern of the cleavage class, we cannot make any assumption with high degree of certainly. Starting from this consideration we design a classifier that assigns to the uncleaved class the patterns classified as uncleavage site by EDC, while rejects the others. The error rate of the EDC, if used without rejection to classify all the patterns, is approximately 84.20%. If we reject all the patterns assigned to the cleaved class, it is able to classify the 62.70% of patterns with an error rate of only 4.4%. A possible method to further reduce the error rate of the edit distance classifier is to reject the patterns, classified as uncleaved by the EDC, that satisfy this rule: (xxx(nyla)xxxx & (!xxxkxxxx j!xx(fkq)xxxxx j!xxxxxcxx j!xxxxxxkx)) The rationale of this rule is From a statistical study on the training set, we have noted that a cleaved pattern with high similarity to an uncleaved one often contains the motif xxx(nyla)xxxx (xxx(nyla)xxxx means that the fourth amino acid must be N, Y, L or A). This rule matches partially with a rule shown in (Tozser et al., 2000). To avoid rejection of many patterns we use some motifs (Rögnvaldsson and You, 2003) that characterize the uncleaved class. These motifs are xxxkxxxx xx(fkq)xxxxx xxxxxcxx xxxxxxkx By coupling these rules to the EDC we reject a further 20.6% of the patterns previously classified: this allows to reduce the error rate to 1.1% on the accepted patterns (which are the 49.83% of the total) Multiclassifier with a modified mean rule We train a LDC classifier using patterns represented by orthonormal encoding as feature extractor and reduced to Table 3 Pairs of amino acids and positions that influence the cleaving decision xxxfxexx xxxyxexx xxxlxexx xxxfxqxx xxvxxexx xxxxpexx xxvfxxxx xxxfpxxx xxixxexx xxxmxexx xxaxxexx xxafxxxx xxxxxexk FxxxxExx Table 4 Single amino acids and positions that influence the cleaving decision xxxkxxxx xxxxxsxx xxxxxkxx xxxpxxxx xxxxcxxx Cxxxxxxx xxyxxxxx a 50 dimensional space by a KL transform. The patterns whose confidence is lower than a prefixed threshold are rejected. The value of the threshold is set experimentally (same value in ALL the tests). In this step the 36.5% of the patterns are classified, with an error rate of 5.75% Cleavage/non-cleavage rule The third step consists in some rules proposed in (Rögnvaldsson and You, 2003): if a pattern contains one of the motifs shown in Table 3, it is classified as cleaved, if it contains one of the motifs shown in Table 4 it is assigned to the uncleaved class, otherwise it is rejected to the next step. Using these rules the 5.64% of the patterns can be classified, with an error rate of 1.96% MLP The patterns rejected by the previous steps are finally classified with MLP classifier based on patterns represented by orthonormal encoding as feature extractor and reduced to a 50 dimensional space by a KL transform. In this last step all the remaining patterns (about the 10.39%) are classified without rejection, with an error rate of 39%. 5. Results of the hierarchical structure Finally, in Fig. 5 we compare the classification error rates obtained by our hierarchical method (HC) and by the systems proposed in (Rögnvaldsson and You, 2003): a MPL coupled with a KL dimensionality reduction (KL + MPL) and a linear-svm classifier (L-SVM). In Table 5 the classification performance of each step of the new hierarchical classifier are summarized: the local error rate is evaluated considering only the patterns effectively classified at each step, while the global error rate is the cumulative error obtained considering all patterns classified till that step; analogously with local classified we mean the percentage of the whole patterns classified at each step, while global classified is the cumulative percentage of classified patterns at each step. It is interesting to note that for HC the 10% of patterns can be considered diffi-

7 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) HC KL+MLP L-SVM HC2 In this section we report some experiments to validate our idea of constructing a hierarchical classifier architecture by merging some ideas derived from the machine learning methodologies and from a knowledge base of this particular problem. A first test has been conducted in order to evaluate the independence between a machine learning classifier as LDC and the set of mining rules based on pattern motifs detailed in Section 4.3. The error independence among these two methods is evaluated by the Q-statistic on the fraction of patterns effectively classified by the rules; the result is 0.9, lower than those obtained for any combination of classifiers reported in Table 2 and Fig. 3. This means that the two approaches, based on different methodologies, are enough different that they can be coupled in a hierarchical classifier. As a further proof we tested a simple two level hierarchical classifier composed only by cleavage/ non-cleavage rule at the first level and KL + LDC at the second level: the error rate of this method was 7.6%, which is higher than HC, but significantly lower than the error rates of all the MCSs built on orthonormal encoding feature space reported in Fig Conclusion cult, since they contribute to generate the higher part of the total error rate. The greater advantage in the use of HC is a very low error rate, with the same rejection rate of a stand-alone L-SVM or a stand alone MLP (as shown in Table 6). That is, if we classify the 89.95% of patterns with higher confidence the Error Rate of L-SVM is 6% while for HC is 3.1% Validation tests Fig. 5. Error rates on HIV-1 dataset. Table 5 Error rate and number of patterns rejected in each step by HC Steps Global error rate (%) Local error rate (%) Global classified (%) 1 EDC CR Local classified (%) Table 6 Classification performance, with a rejection rate, using L-SVM or MLP Method Error rate (%) Global classified (%) L-SVM L-SVM L-SVM MLP MLP MLP The problem addressed in this paper is to recognize, given a sequence of amino acids, HIV-1 protease cleavage site. We showed by an empirical comparison of several classification methods that coupling a linear transform to a non-linear classifier low error rates can be obtained. Moreover we introduced, for the first time, a method that combines ideas derived from the machine learning methodologies and from a knowledge base of this particular problem. Our experiments showed, by means of the Q-statistic, that the combination between a machine learning classifier and rules based on pattern motifs can be interesting. Finally we illustrated how to obtain a composed method with a very low error rate for HIV-1 protease specificity problem, starting from an exhaustive study of several classification methodologies. This approach is very important in the field of bioinformatics where methods taken from machine learning are often applied without a deepened study of the problem. In particular, in this work, we proposed a hierarchical classifier that combines linear and non-linear classifiers and rules based on pattern motifs. Our hierarchical classifier is composed, at each level, by classifiers highly independent each other, taken in order of the best classification rate for the patterns not rejected. The major advantage of the proposed approach is the low error rate, better than other stand-alone methods proposed in the literature. The major disadvantage is that our system cannot help to understand the relationship between the data. References Beck, Z.Q., Hervio, L., Dawson, P.E., Elder, J.E., Madison, E.L., Identification of efficiently cleaved substrates for HIV-1 protease using a phage display library and use in inhibitor development. Virology. Breiman, L., Bagging predictors. Machine Learn., Cai, Y.D., Chou, K.C., Artificial neural network model for predicting HIV protease cleavage sites in protein. Adv. Eng. Software 29, Cai, Y.D., Yu, H., Chou, K.C., Using neural network for prediction of HIV protease cleavage sites in proteins. J. Protein Chem. 17, Chou, J.J, 1993a. A formulation for correlating properties of peptides and its application to predicting human immunodeficiency virus proteasecleavable sites in proteins. Biopolymers 33, Chou, J.J., 1993b. Predicting cleavability of peptide sequences by HIV protease via correlation-angle approach. J. Protein Chem. 12, 291.

8 1544 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) Chou, K.C., 1993c. A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. J. Biol. Chem. 268, Chou, K.C., Review: Prediction of HIV protease cleavage sites in proteins. Anal. Biochem. 233, Chou, K.C., Zhang, C.T., Diagrammatization of codon usage in 339 HIV proteins and its biological implication. AIDS Res. Human Retroviruses 8, Chou, K.C., Zhang, C.T., Kezdy, F.J., A vector approach to predicting HIV protease cleavage sites in proteins. Proteins: Struct., Funct., Genet. 16, Chou, K.C., Tomasselli, A.L., Reardon, I.M., Heinrikson, R.L., Predicting HIV protease cleavage sites in proteins by a discriminant function method. PROTEINS: Struct., Funct., Genet. 24, Dietterich, T.G., Ensemble methods in machine learning. In: Kittler, J., Roli, F. (Eds.), Multiple Classifier Systems. First International Workshop, MCS 2000, Lecture Notes in Computer Science, Springer-Verlag, Cagliari, Italy, pp Duda, R., Hart, P., Stork, D., Pattern Classification. Wiley, New York. Giusti, N., Masulli, F., Sperduti, A., Theoretical and experimental analysis of a two-stage system for classification. IEEE Trans. PAMI 24 (7), Houle, G., Aragon, D., Smith, R., Kimura, D.,1998. A multilayered corroboration-based check reader. In: Hull, J., Taylor, S., (Eds.), Document analysis system Kittler, J., Hatef, M., Duin, R.P.W., Matas, J., On combining classifiers. IEEE Trans. Pattern Anal. Machine Intell. 20 (3). Levenshtein, V.I., Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR 163 (4), Masulli, F., Valentini, G., Comparing decomposition methods for classification. In: Howlett, R.J., Jain, L.C. (Eds.), KES 2000, Fourth International Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies. IEEE, Piscataway, NJ, pp Mayoraz, E., Moreira, M., On the decomposition of polychotomies into dichotomies. In: The XIV International Conference on Machine Learning, , Nashville, TN, July. Narayanan, A., Wu, X., Yang, Z., Mining viral protease data to extract cleavage knowledge. Bioinformatics 18, S5 S13. Rögnvaldsson, T., You, L., Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics, Scholkopf, S., Smola, A., Muller, K.R., Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 (5), Tan, A.C., Gilbert, D., An empirical comparison of supervised machine learning techniques in bioinformatics. In the Proceedings of the First Asia Pacific Bioinformatics Conference (APBC 2003). 19: Thompson, T.B., Chou, K.C., Zheng, C., Neural network prediction of the HIV-1 protease cleavage sites. J. Theoret. Biol. 177, Tozser, J., Zahuczky, G., Bagossi, P., Louis, J., Copeland, T., Oroszlan, S., Harrison, R., Weber, T., Comparison of the substrate specificity of the human T-cell leukemia virus and human immunodeficiency virus proteinases. Eur. J. Biochem. 267, Wu, C.H., Whitson, G., McLarty, J., Ermongkonchai, A., Change, T.C., PROCANS: Protein classification artificial neural system. Protein Sci., Yule, G.U., On the association of attributes in statistics. Philos. Trans., A 194,

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network Contents Classification Rule Generation for Bioinformatics Hyeoncheol Kim Rule Extraction from Neural Networks Algorithm Ex] Promoter Domain Hybrid Model of Knowledge and Learning Knowledge refinement