EEG signal classification using Bayes and Naïve Bayes Classifiers and extracted features of Continuous Wavelet Transform Reza Yaghoobi Karimoi*, Mohammad Ali Khalilzadeh, Ali Akbar Hossinezadeh, Azra Yaghoobi Karimoi st Department of Biomedical Engineering, Islamic Azad University, Mashhad Branch, Mashhad, Iran. Email: reza_yaghoby@yahoo.ca st Department of Biomedical Engineering, Islamic Azad University, Mashhad Branch, Mashhad, Iran. Email: makhalilzadeh@mshdiau.ac.ir 3 nd Department of Communications Engineering, Urmia University, Iran. Email: aliakbar.hossinezadeh@gmail.com 4 rd Department of Electronic Engineering, Sadjad University of Technology, Mashhad, Iran. Email: a_yaghoobi_k@yahoo.com Received: May 03 Revised: August 03 Accepted: January 04 ABSTRACT in this paper, we recommend a method of the signal processing for analyzing EEG. To this end,, the signal using the continuous wavelet transform (CWT is decomposed into dominant scales and a set of statistical features is extracted from these scales, which shows the distribution of wavelet coefficients. Then, the feature selection methods: sequential forward search (SFS and sequential backward search (SBS is used to reduce the dimension of the data. Finally, these features give as input to the Bayes and Naïve Bayes classifier with three kinds of discrete outputs: normal, inter-ictal, and ictal. The results of this study show that the highest performance is related to the Bayes classifier, so that the classification accuracy of this classifier using all the features is %99 and using the selected features by SFS and SBS is %00. KEYWORDS: Electroencephalogram (EEG; Epileptic seizure; Continuous wavelet transform (CWT; Sequential forward search (SFS; sequential backward search (SBS; Bayes classifier; Naïve Bayes classifier. INTRODUCTION An important mechanism in body of a living organism is the controlling of the biological activities and the information transferring through production and propagation of nerve impulses between the cells and organs, which lead to special signals (bio-signal. One of the signals that is produced by electrical activities of synaptic cells of the brain is EEG signal. The precise analysis of this signal provides a helpful view of the brains disorders especially in epileptic disorders. Therefore, Pseudo-epileptic discharge detection is an important step for the diagnosis of epileptic seizures (Subasi, 007. Basic studies performed to determine relationship between the EEG signal and its behavior usually used the analog frequency analyzers to analyze the EEG data. On this basis, they defined four different frequency bands for the EEG signals in the frequency field: δ (-4 Hz, θ (4 8 Hz, α (8 Hz, and β ( 30 Hz (Adeli, 003; Subasi, 007; Yamaguchi, 003. In fact, this analysis provides a quantitative measure of the amplitude distribution and stationary for the EEG signal. Hence, this analysis loses details such as nonstationary information related to the specific patterns in the EEG such as spikes. So, we for the keeping of nonstationary behaviors of the EEG signal need to the Time-frequency processing algorithms such as continuous wavelet transform analysis.(adeli, 003; Alessandro, 003; Latka, 003; Subasi, 007, 00 One of the applications of the control of epilepsy is the detection of epilepsy (Subasi, 00. Current studies (Nigam, 004; Subasi, 005a, 005b, 007, 00 only overviewed two-class classifiers: normal and generalized epilepsy (or ictal using extracted features from sub-bands of the discrete wavelet transform (DWT and that they did not provided the understandable method about the number of decomposition levels and wavelet type. Since, EEG signals do not have any useful frequency components above 30 Hz, they selected the number of decomposition levels equal to five (Subasi, 007. Therefore, in most cases, the extracted features do not show enough separation. In addition, they for the selecting of wavelet type tested all or some of the wavelets. This method not was easy and it is time consuming (Subasi, 007.
The aim of the current study is to propose a classifier of three classes: normal, inter-ictal, and ictal using the extracted features from the CWT scales. Since, the features are extracted from the dominant scales of the time-scale plate of the CWT. This classifier has enough ability to detect the three classes and that do not need to select the type of wavelet. Hence, the type of wavelet is selected as arbitrary and the numbers of scales are based on the time-scale plate of the CWT.. MATERIALS AND METHODS A. Data The publicly available data described by Andrzejak, et al. [7] is employed in the present study. The complete data consists of five sets from A to E, that each has 00 single-channel EEG segments. As a typical case shown in Fig., sets A and B consist of segments taken from surface EEG recordings that were carried out on five healthy volunteers using a standardized electrode placement scheme. The volunteers were relaxed in an awake state with eyes open (A, and eyes closed (B, respectively. Sets C, D, and E are originated from the EEG archive of pre-surgical diagnosis. The EEG signals of five patients were selected. All of the patients had achieved complete seizure control after resection of one of the hippocampal formations, which was correctly diagnosed to be the epileptogenic zone. Segments in set D were recorded from within the epileptogenic zone and segments in set C from the hippocampal formation of the opposite hemisphere of the brain. While sets C and D contained only activity measured during seizure free intervals (interictal, set E only contained seizure activity (ictal. All EEG signals were recorded with the same 8-channel amplifier system, using an average common reference. The data were digitized at 73.6 samples per second using -bit resolution. Band-pass filter settings were 0.53 40 Hz ( db/oct. In this study, we use every five sets. As such, sets A and B represent the normal class, sets C and D the interictal class, and set E the ictal class. B. Analysis using continuous wavelet transform Wavelet transform is a spectral estimation technique, so that any general function can expressed as an infinite series of wavelets. The original idea of wavelet analysis is the measuring of similarity rate between the signal segments and the wavelet function. In another word, the signal decomposition produces a coefficient series called wavelet coefficients. Thus, the signal can be reconstructed by a linear combination from the weighted wavelet coefficients. The main property of the GABOR function and wavelets is the time-frequency localization. This means that most of the energy of the GABOR function and wavelets is restricted to a finite time interval. Of course, due to the scaling operation performed on the mother wavelet function, during each scale, the time-frequency localization is variable. Hence, wavelet analysis produces a suitable frequency resolution at low frequencies (short time windows, and an appropriate time resolution at high frequencies (long time windows. In other word, the wavelet analysis produces a segmentation in the time-scale plane (or timefrequency plane that is suitable for non-stationary signals, especially in transient events. Therefore, the wavelet analysis technique applied to the EEG signal can reveals features related to the transient nature of the signal, which are not obvious by the fourier transform (Qian, 996; Subasi, 007. Generally, General equation of the CWT can define as: t b W ( a, b x( t ( dt ( a a Where b is the temporal shift of function and a leads to the scaling of. Mark (* is the conjugate operator and the normalizing factor / a ensures that the energy for all the values of a is the same (Strang, 997. The CWT is usually used in engineering and medical applications, and it is a suitable signal-processing tool for detection of epilepsy seizure. Generally, the CWT decomposes signals at different scales. In other word, this method calculates the similarity rate of the signal and the wavelet function at different scales. Therefore, the various scales have different information. In fact, the amplitude variation of the wavelet coefficients depends on the frequency distribution and the frequency distribution deviation in the signal (Strang, 997. Selection of the proper wavelet and the number of scales is very important for signal analysis using the CWT. Basically; the number of scales is selected based on the number of dominant frequency components in the signal. Hence, in order to have a significant separation between the extracted features and the coefficients of the scales should choose the scales, appropriately. For find the optimum scales, we calculated the mean (of power of the 00 plates of the CWT from various states of the EEG signal (eyes open, eyes closed, interictal, and ictal, first. Each of plates is consist of the 7, 5, 30, 5 and 64 (by db3 wavelet scale, which they show in Fig.. In fact, these scales are dominant scales in the CWT plates for each of the various states that we selected them as visual. As seen that in Fig., the mentioned plates can separate simply. Usually, To choose the wavelet type is test different types of the wavelets (Subasi, 007, but in this proposed method due to the scale selection is not need to test different wavelets, and any wavelet can be selected choose as arbitrary, so that the time-scale planes of this wavelet can make a significant separation between the various states.
C. Feature extraction The extracted wavelet coefficients provide a compact representation, which shows the energy distribution of the EEG signal in time and frequency (or scale. Table. I shows the frequencies correspond to different scales for the db3 wavelet with sampling frequency of 73.6 Hz. In order to decrease the dimensionality of the extracted feature vectors of the wavelet coefficients are extracted their statistical properties (Subasi, 007. The statistical features used to represent the time-frequency distribution of the EEG signals are defined as: The absolute mean of the coefficients in each scale The standard deviation of the coefficients in each scale 3 The ratio of the absolute mean of adjacent scales the amount of the changes in the frequency distribution (Subasi, 007; Subasi, 00. These features are calculated for the 7, 5, 30, 5, and 64 scales and then they are used for classification of the EEG signals. TABLE.. FREQUENCIES CORRESPONDING TO DIFFERENT SCALES OF DAUBECHIES 4 WAVELET WITH A SAMPLING FREQUENCY OF 73.6 HZ. Scales 64 5 30 5 7 Scales Frequency (Hz.9376.3848 4.336 8.67 7.753 Feature represents the frequency distribution of the signal in each scale and the features and 3 represent Fig.. the mean (of power of the 00 plates of the CWT from various states of the EEG signal: a all scales, b the 7, 5, 30, 5 and 64 scales. 3
D. Feature selection The aim of feature selection is to find an N dimensional feature subset from an M dimensional feature set, so that the energy loss in this selection is the minimum. Sequential Forward Selection (SFS method, Sequential Backward Selection (SBS method, Plus-l Minus-r Selection (LRS method are the well-known methods in the statistical pattern recognition for feature selection. One important specification of these methods is the compatibility with the real-time algorithms, while, the feature extraction methods (such as Principal component analysis (PCA, independent component analysis (ICA and linear discriminant analysis (LDA are not compatible with the real-time algorithms. Therefore, they are incompatible with classifiers. In fact, a diagnosis system receives just a sample of the testing data (a EEG signal and detects normality or abnormality, while a diagnosis system using feature extraction method receives a distribution (several samples of the testing data for diagnosis. This topic is considered as a disadvantage (because, the scatter matrix in PCA, ICA and LDA algorithms for a single sample is zero. The SBS and the SFS algorithms (Theodoridis, 003 can summarize as follows: SFS algorithm: If X = {x, x, } is the input feature vector. Start with the empty set: Y = {}. Select the next best feature using Y vector, x New X and classifier accuracy (or other performance criteria 3. If accuracy is larger than previous step (or desired accuracy, then go to 6 4. The Y vector is the selected Features 5. Go to 8 6. Then x Best Y x Best X 7. Go to 8. End SBS algorithm: If X = {x, x, } is the input feature vector. Select the next worst feature by X vector and classifier accuracy (or other performance criteria, If accuracy is larger than previous step (or desired accuracy, then go to 4. The vector X is the selected Features 3. Go to 6 4. Then x worst X 5. Go to 6. End 3. CLASSIFIERS E. Bayes Classifier In statistics, The Bayesian classifier is an optimal classifier, which is created by the mean error minimum. in other word, this classifier tries to find the global minimum in the error function. In fact, Bayse classifier separates the feature vectors by the comparing of decision functions of the classes and according to the largest output selects the class of input sample (feature vector. This means that the X sample is related to the class that P(ω i X is more and generally is expresses as follows (Theodoridis, 003;Duda, 973 : x i P( i X P( j X ( i j Where ω i, X i and P(ω i X are related to i th class, the input feature vector and decision function, respectively. On the other hand, we according to the Bayes equation have: P( ( ( ( i P X g i i X P i X (3 P( X Fig.. Structure of the Bayes and Naïve Bayes classifiers Where P(X ω i, P(ω i and P(X, are the conditional probability of X observation given in class ω i (Likelihood, the Prior probability of class ω i and the Probability of the class X. Of course, P(X is a normalization constant that does not affect on the decision, so: gi ( X P( i P( X i (4 According to the assumption that the distribution of biological data is normal, therefore: i n i T ( X i ( X i i P( X e (5 ( Where i and i are mean and covariance of the feature vectors of the test data of each class, respectively. The decision function for the i th class is equal to: 4
g X X X T i ( ( i i ( log( i log P( i (6 Hence, the X vector will belong to the class that its decision function is the maximum. Figure. 3 shows the structure of a Bayes classifier. A. Naïve Bayes Classifier Naïve Bayes Classifier is a simplified case of the Bayes classifier. In this classifier, it assumed that the features are independent from each other. This assumption lets the decisions function to rewrite as (Duda, 973; N gi ( X P( i P( X ( d i (7 d According to the assumption that the distribution of biological data is normal, therefore: X ( d i N ( ( g e i i X P i (8 d i Where the number of feature is N. i and i are the mean and variance of the d th feature vector of the test data of each class, respectively. The main advantage of this classifier is calculating the distributions of a single variable. Hence, the classification time for this classifier is shorter. Of course, related to the independence rate of the features, its accuracy is different. The structure of this Classifier is similar to structure of the Bayes classifier (See figure. 3. 4. EXPERIMENTAL RESULTS The fundamental requirement for creating an optimum classifier and model has a complete data (population. Otherwise, the best classifiers are inefficient. The data collecting has two meanings: components are from the same pattern, set of inputs represent a given suitable pattern. TABLE.. THE DISTRIBUTION OF THE TRAINING AND TESTING SETS Classes Sets training set test set Total Normal Pre-ictal Ictal Total A, B C, D E All 40 40 0 600 60 60 80 400 400 400 00 000 Detection of the epilepsy seizures in EEG is a kind of pattern recognition, which consists: data acquisition, signal processing, feature extraction, feature reduction, and epilepsy detection. In this study, the EEG records by the CWT to the 7, 5, 30, 5 and 64 scales are decomposed (see Figure., and that is extracted a set of the statistical features from these scales. Then, using two methods: SFS and SBS are discarded the irrelevant features. These selected features give as input to both Bayes and Naïve Bayes classifier with three outputs: normal, inter-ictal, and ictal. Fig. 4 shows the proposed classification algorithm. Finally, the Bayes and the Naïve Bayes classifiers are evaluated using 0-fold that ratio of testing set to training set of they are 40 to 50. The distribution of the training set and the testing set are briefly show in Table. II. Table. III shows the evaluation results for both of the classifiers using all of the extracted features and the selected features by the SFS and SBS methods. As seen that, the SFS and SBS methods could increased the classification accuracy for both of the classifiers. In addition, the classification accuracy of the Bayes classifier is higher than the Naïve Bayes classifier, which its reason is independence condition of the Naïve Bayes classifier. Of course, the findings show that all of the classifiers had high accuracy; therefore, the extracted features can make a significant separation between the classes. TABLE.3. THE RESULTS OF CLASSIFIERS USING ALL THE EXTRACTED AND SELECTED FEATURES WITH SFS AND SBS METHODS Classifier Features Maximum accuracy Naïve Bayes ALL (30 number %95.9 SFS (8 number %98.5 SBS (4 number %98 Bayes All (30 number %99 SFS (6 number %00 SBS (7 number %00 Fig. 3. Classification process 5
5. DISCUSSION Based on the previous research performed to classify the EEG signal, we are interested to emphasize the following resources: Mashakbeh and et al. [] used the 0 segments of the A and E sets (the 0 segments of normal class and the 0 segments of ictal class and the standard deviation of the DWT coefficients to classify two classes: normal and ictal. In fact, they suggested a threshold for the standard deviation of the wavelet coefficients, so that, if threshold was higher than 40, segment was belonged to the ictal class, and otherwise, it was belonged to the normal class. According to this method, they had 00% accuracy. Nigam, et al. [9] used different features of the sets A and E and ANNs, and provided an accuracy of 97.%. Generally, they claimed that the ANNs are efficient to classify the normal and ictal. 3 Subasi [30] used the sets A and E and applied the DWT on them to extract the features such as mean and standard deviation each sub-band and ratio of adjacent sub-bands. Then, using a MENN (which was included three expert networks and a gaiting network and a standalone MLP classified sets A and E into two classes: normal and ictal. Finally, he declared the accuracy for the MENN was as 94.5% and for the standalone MLP was as 93.%. 4 Subasi, et al. [9] used a combination of different classifiers on the datasets A and E with the following results: SVM+PCA (98.75% accuracy, SVM+ICA (99.5% accuracy and SVM+LDA (00% accuracy. Although, the above methods can find the optimal features but those changes the classifiers as an offline classifier. Then, these feature extraction methods (such as Principal Component Analysis (PCA, Independent Component Analysis (ICA and Linear Discriminant Analysis (LDA have not compatibility with quasi-real time algorithms, and are incompatible with the diagnostic system. Because, a diagnostic system receives only a single sample (i.e. features of an EEG segment of the testing data and distinguishes the normal data from abnormal ones, while in a diagnostic system using the feature extraction methods; several samples of the testing data has needed. This structure is considered as a disadvantage because the scatter matrix in PCA, ICA, and LDA algorithms is zero for a single sample. Although, the previous studies show promising results in EEG classification. Nevertheless, some unsolved problems such as statistical society, optimum features, compatibility with on-line algorithms, computing load are in this field, which should be considered in the design of classifiers. One of these problems is to find of optimum feature to increase the classification accuracy and that to reduce the computing load. The proposed method in this work showed that the extracted features of the scales-plane are suitable to analysis of EEG signal and make significant separation between the classes. In addition, this method is compatibility with on-line algorithms. 6. CONCLUSION Epilepsy is not a disease; rather, it is a sign of various disorders such as infections, head physical damages, brain tumors, brain damages during infancy, hereditary disorders. Hence, to diagnose epilepsy are needed to check the patient s history and physical examinations, interpretations of EEG and collections of additional clinical information (such as CT-SCAN, PET, MRI and Etc.. New, owing to simplicity of the recording of brain signals are used the EEG analyzers and the EEG classifiers to detect epilepsy. In fact, due to increasing world population and Lack of time, a classifier that sifts subjects in terms of normal, pre-ictal and ictal classes, is a valuable diagnostic tool for prediction and treatment of epilepsy. Hence, in this work, we tried to provide a method for extracting of feature from CWT that can be used three-class separators. Generally, Findings of this research showed that the extracted features from the wavelet coefficients of the selected scales (7, 5, 30, 5, and 64 provide the good representation of the epileptic signals and they have sufficient separation in high dimensions. In additional, the comparing of the previous results with this study show more separation for the proposed method. On the other hand, classification results using the SFS and SBS methods showed that the suggested methods have a high ability to remove irrelevant features (or dimension reducing. REFERENCES [] Adeli, H. and Z. Zhou, et al, "Analysis of EEG records in an epileptic patient using wavelet transform." Journal of Neuroscience Methods 3: pp.69-87, 003. [] Alessandro, M. and R. Esteller, et al, "Epileptic Seizure Prediction Using Hybrid Feature Selection Over Multiple Intracranial EEG Electrode Contacts: A Report of Four Patients." IEEE transactions on biomedical engineering 50(5: pp.603 65, 003. [3] Andrzejak, R. G. and K. Lehnertz, et al. "Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state." Physical review 64: 06907-069078, 00. [4] Duda, R. O. and P. E. Hart, et al. "Pattern classification." Wiley Interscience, 973. [5] Latka, M. and Z. Was, et al. "Wavelet analysis of epileptic spikes." Physical review. E, Statistical, nonlinear, and soft matter physics 67(: 0590.- 0590.4, 003. [6] Mashakbeh, A. A. "analysis electroencephalogram detect epilepsy" International Journal of Academic Research (3:00. 6
[7] Nigam, V. P. and D. Graupe. "A neural-network-based detection of epilepsy." Neurological Research 6(: pp.55-60, 004. [8] Qian. SH. and D. Chen, Joint time-frequency analysis; methods and applications. : Prentice Hall PTR, 996. [9] Strang, G. and T. Nguyen (997. "Wavelets and Filter Banks." wellesley-cambridge press, 997. [0] Subasi, A. "Epileptic seizure detection using dynamic wavelet network." Expert Systems with Applications 9: pp.343-355, 005. [] Subasi, A. "EEG signal classification using wavelet feature extraction and a mixture of expert model." Expert Systems with Applications 3: 084-093, 005. [] Subasi, A. and E. Ercelebi "Classification of EEG signals using neural network and logistic regression." Computer Methods and Programs in Biomedicine 78: 87-99, 005. [3] Subasi, A. and M. I. Gursoy "EEG signal classification using PCA, ICA, LDA and support vector machines." Expert Systems with Applications 37: 8659-8666, 00. [4] Theodoridis, S. and K. Koutrumbas "Pattern recognition." Academic Pres\ I+ dn imprint ot Elsevier. [5] Yamaguchi, C. "Fourier and wavelet analyses of normal and epileptic electroencephalogram (EEG." Neural Engineering, 003. Conference Proceedings. First International IEEE EMBS Conference on 33(. 7