Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network

Contents Classification Rule Generation for Bioinformatics Hyeoncheol Kim Rule Extraction from Neural Networks Algorithm Ex] Promoter Domain Hybrid Model of Knowledge and Learning Knowledge refinement Network refinement Rule Generation with Genetic Algorithm SARS CoV Protease Cleavage Site Prediction MHC Binding Peptide Prediction Just Classifier? NN and SVM Good Classifiers Just Classification, But No Description No Explanation Symbolic knowledge is VERY important in the medical and molecular biological domain. They want why and how too, in addition to just answers. Rule Extraction from a trained network Provides the symbolic interpretation! Domain Neural Network Learning Connection Weights Extraction Symbolic Domain Understanding, Knowledge Acquisition, Mining, etc Example of If-Then Rule : 2 dimension, multi-valued dimension. DNA sequence of size 2. G C p2 T A A T C G p1 If p1=c and p2=t, then Class 1 If p1=c, then Class 1 If p1=~c, then Class 1 If *, then Class 1 4 2 =16 cases 5 2 =25 s Looking for the s that are maximally general, and accurate Most specific. Covers just 1 case. Most general. Covers all the cases : example A dataset of 362 amino acid sequences of each length 8. 8 dimensions. Covers 20 6 =64,000,000 instances Each dimension is 20-valued. Our representation for convenience xxxfxpxx => if F@4 & P@6, then Cleaved. xxxxxexx => if E@6 then Cleaved. where x means don t care. Covers 20 7 =1,280,000,000 instances 1

If-Then cover Rectangle Areas. Thus, cannot cover 100%. However, in most cases Maximally general s are good enough Issues Types of s? Descriptive accuracy Human understandability How to generate the s? The s generated from DT, NN and SVM would be different? class1 class2 Rule extraction from NN Rule extraction from NN Output vector Input vector 5 dimension Binary inputs x1 y1 x2 x3 x4 x5 x x 1 0 x Min/Max output??? Covers 2 3 = 8 instances If Min_output > 0.7, Then we get the xx10x with 100% accuracy Two approaches Black-box approach NN is a black box. Just look at input-output pairs to induce a set of s. White-box (Decompositional) approach 1. Extract s from each hidden and output node. 2. Aggregate the intermediate s to form the composite base. Decompositional Approaches: Rule extraction from a single node x1 W=1 2-6 3 0.5 x2 x3 Threshold= -1 Consider a combination (x1, x2) weight-sum (x1, x2) = 1*1 + 1*2 +?*(-6) +?*3 +?*0.5. There are 2 5-2 possible inputs. Lowest weight-sum (x1,x2) = 1*1 + 1*2 + 1*(-6) + 0*3 + 0*0.5 = -33 < threshold thus, the (x1, x2) is NOT valid. Examples of valid combination: (x1, x2, x4), (not x3) x4 x5 Decompositional Approaches: Rule extraction from a single node Find a complete set of s (not a single or a partial set of s) that are: valid (I.e., lowest weight-sum > threshold) maximally-general, (I.e., smaller size) not subsumed ex) (x1, x2) subsumes (x1, x2, x4). 2

Decompositional Approaches: Complexity Rule search space: exponential complexity with the # of attributes. n attributes: space 3^n Heuristics KT [Fu], MofN [Towell], RX [Setino], OAS [Kim] Promoter Domain obtained from Public Domain (Univ. of Wisconsin) 106 instances Each instance string is comprised of 57 sequential nucleotides. 50 nucleotides before transcription start point 6 nucleotides following the transcription start point An instance is Positive if the promoter region is present in the string Negative otherwise Promoter Domain Domain 2 classes (promoter and non-promoter) 57 attributes (discretized into 57*4=228 binary attributes) 106 instances (53 promoters, 53 non-promoters) Neural Network (MLP) 228-4-1 architecture Promoter: experiment 1 NN Trained 100% correct on the 106 instances 7 s Extracted Promoter: experiment 1 Hybrid System (Knowledge+ Experience) Domain-specific Model Domain Domain Knowledge 3

My Work on Neural Networks 1. Knowledge Extraction 2. Knowledge revision 3. Network revision based on Knowledge Knowledge Revision using Neural Networks Extract knowledge train train Neural Network extract mapping Neural Network extract Knowledge () Revised Domain Knowledge train Revised Neural Network extract Knowledge Neural Network () Knowledge Revised Base Knowledge Base mapping Knowledge-based Neural Network training Experiment: Promoter domain 1. Initial domain knowledge : 14 s 2. Mapped to KBNN (228-4-1) 3. KBNN trained by 106 instances 4. Interpret the KBNN into 14 s : revised knowledge Experiment Promoter domain The revised theory improves considerably over the initial theory Error Rate: 53/106 -> 3/106 Neural Networks based on Self-Extracted Knowledge Restructuring Process Domain Neural Network Learning Connection Weights Extraction Architecture (# of nodes, Connections, etc) Symbolic Concise, Available, Complete, Less noisy, etc 4

Experiment: Promoter HIV Cleavage Site Prediction # of layers # of connections Ave. generalization HIV Protease Cleavage Original NN (228-4-1) RBNN 3 5 921 64 81.1% 82.1% ARBNN (224-5-1) 3 12 84.9% Genetic Algorithm Initial population 1 0 0 1 1 1 0 1 1 1 0 0 1 1 1 1 Evolution Process 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 Final population Problem to be Solved encoding Crossover Mutation Natural Selection Fitness Function Most fitted ones survives. (fitted to fitness function) decoding Problem Solved Rule Space of Cleavage Site Prediction 8-residue amino acids Total 21^8 (= 37,822,859,361) different s possible. What search strategy? Random Search Exhaustive Search Heuristic Search Genetic Algorithm Search Genetic Algorithm Drawbacks of GA Sensitive to initial population 5

Genetic Algorithm Knowledge-Based Genetic Algorithm (KBGA) Knowledge-based 1 Domain Expert 2 -Oriented Knowledge Genetic Algorithm Population size :10 Each chromosome representation xfxxaxxl Random cross-over point Random mutation Fitness function Generality: # of x symbols in a chromosome Accuracy: over the dataset SARS CoV Protease Cleavage Site Prediction Experiment Mutant of Corona virus set Instance: a sequence of 8 amino-acids 70 positive, 1267 negative instances All positive instances include Q@p1 set Sequence Logo DT NN KBGA Sequence Logo GA Three Consensus LQ LQ[S/A] [T/S/A]x[L/F]Q[S/A/G]. 6

GA Prediction of MHC class I binding peptides T-cells are key players in regulating a specific immune response. Activation of cytotoxic T-cells requires recognition of specific peptides bound to Major Histocompatibility Complex (MHC) class I molecules. Therefore prediction of MHC-binding properties is very important issue for the rational design of peptide vaccines aimed at boosting the immune response against a foreign antigen. Only one in 100 to 200 potential binders actually binds to a certain MHC molecule, therefore a good prediction method for MHC class I binding peptides can reduce the number of candidate binders that need to be synthesized and tested. Experiment MHC set set DT Each instance belongs to binder or non-binder NN SVM KBGA Performance ANN, SVM Experimental Results HLA-A*0201(9) (> Accuracy 75%) HLA-A*0201(10) (> Accuracy 70%) 7

Experimental Results HLA-A1 (> Accuracy 75%) Experimental Results HLA-A3 (> Accuracy 75%) Experimental Results HLA-B*8 (> Accuracy 75%) Experimental Results HLA-B*2705 (> Accuracy 75%) Sequence Logo of MHCpeptides Sequence Logo of MHCpeptides 8

Knowledge Issues Rule extraction from SVM Domain DT NN GA Nahla Barakat and Joachim Diederich ; Learning-based Rule-Extraction from Support Vector Machines Núñez, Angulo and Català : Rule extraction from support vector machines, 2002 Glenn Fung, Sathyakama Sandilya, R. Bharat Rao : Rule extraction from Linear Support Vector Machines. 9