Clinical Examples as Non-uniform Learning and Testing Sets

Similar documents
ECG Beat Recognition using Principal Components Analysis and Artificial Neural Network

DIFFERENCE-BASED PARAMETER SET FOR LOCAL HEARTBEAT CLASSIFICATION: RANKING OF THE PARAMETERS

CARDIAC ARRYTHMIA CLASSIFICATION BY NEURONAL NETWORKS (MLP)

Robust system for patient specific classification of ECG signal using PCA and Neural Network

MORPHOLOGICAL CHARACTERIZATION OF ECG SIGNAL ABNORMALITIES: A NEW APPROACH

CHAPTER 5 WAVELET BASED DETECTION OF VENTRICULAR ARRHYTHMIAS WITH NEURAL NETWORK CLASSIFIER

Keywords: Adaptive Neuro-Fuzzy Interface System (ANFIS), Electrocardiogram (ECG), Fuzzy logic, MIT-BHI database.

PCA Enhanced Kalman Filter for ECG Denoising

AUTOMATIC CLASSIFICATION OF HEARTBEATS

Robust Detection of Atrial Fibrillation for a Long Term Telemonitoring System

Electrocardiogram beat classification using Discrete Wavelet Transform, higher order statistics and multivariate analysis

fuzzy models for cardiac arrhythmia classication Rosaria Silipo 1 International Computer Science Institute, Berkeley, USA

Vital Responder: Real-time Health Monitoring of First- Responders

ECG Signal Analysis for Abnormality Detection in the Heart beat

ECG Rhythm Analysis by Using Neuro-Genetic Algorithms

A MULTI-STAGE NEURAL NETWORK CLASSIFIER FOR ECG EVENTS

Continuous Wavelet Transform in ECG Analysis. A Concept or Clinical Uses

REVIEW ON ARRHYTHMIA DETECTION USING SIGNAL PROCESSING

Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering

Neural Network based Heart Arrhythmia Detection and Classification from ECG Signal

Genetic Algorithm based Feature Extraction for ECG Signal Classification using Neural Network

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 2, Issue 10, April 2013

Analysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System

USING CORRELATION COEFFICIENT IN ECG WAVEFORM FOR ARRHYTHMIA DETECTION

International Journal of Advance Engineering and Research Development

A Review on Arrhythmia Detection Using ECG Signal

Keywords : Neural Pattern Recognition Tool (nprtool), Electrocardiogram (ECG), MIT-BIH database,. Atrial Fibrillation, Malignant Ventricular

Learning Decision Tree for Selecting QRS Detectors for Cardiac Monitoring

Real-time Heart Monitoring and ECG Signal Processing

Assessment of the Performance of the Adaptive Thresholding Algorithm for QRS Detection with the Use of AHA Database

Detection and Classification of QRS and ST segment using WNN

IDENTIFICATION OF NORMAL AND ABNORMAL ECG USING NEURAL NETWORK

Local Image Structures and Optic Flow Estimation

Wavelet Decomposition for Detection and Classification of Critical ECG Arrhythmias

Fuzzy Based Early Detection of Myocardial Ischemia Using Wavelets

Deep learning and non-negative matrix factorization in recognition of mammograms

Classification of electrocardiographic ST-T segments human expert vs artificial neural network

One-Year Survival Prediction of Myocardial Infarction

Multi Resolution Analysis of ECG for Arrhythmia Using Soft- Computing Techniques

AFC-ECG: An Intelligent Fuzzy ECG Classifier

Mining of an electrocardiogram

An ECG Beat Classification Using Adaptive Neuro- Fuzzy Inference System

Automatic Detection of Heart Disease Using Discreet Wavelet Transform and Artificial Neural Network

Logistic Regression Multinomial for Arrhythmia Detection

Identification of Ischemic Lesions Based on Difference Integral Maps, Comparison of Several ECG Intervals

Automated Diagnosis of Cardiac Health

Detection of Atrial Fibrillation Using Model-based ECG Analysis

Analysis of Fetal Stress Developed from Mother Stress and Classification of ECG Signals

The Use of Artificial Neural Network to Detect the Premature Ventricular Contraction (PVC) Beats

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

Classification of Cardiac Arrhythmias based on Dual Tree Complex Wavelet Transform

Katsunari Shibata and Tomohiko Kawano

Premature Ventricular Contraction Arrhythmia Detection Using Wavelet Coefficients

Wavelet Neural Network for Classification of Bundle Branch Blocks

When Overlapping Unexpectedly Alters the Class Imbalance Effects

Minimum Feature Selection for Epileptic Seizure Classification using Wavelet-based Feature Extraction and a Fuzzy Neural Network

An Enhanced Approach on ECG Data Analysis using Improvised Genetic Algorithm

Body Surface and Intracardiac Mapping of SAI QRST Integral

Keywords Missing values, Medoids, Partitioning Around Medoids, Auto Associative Neural Network classifier, Pima Indian Diabetes dataset.

MULTILEAD SIGNAL PREPROCESSING BY LINEAR TRANSFORMATION

COMPRESSED ECG BIOMETRIC USING CARDIOID GRAPH BASED FEATURE EXTRACTION

A framework for the Recognition of Human Emotion using Soft Computing models

Monitoring Cardiac Stress Using Features Extracted From S1 Heart Sounds

Computational Intelligence Lecture 21: Integrating Fuzzy Systems and Neural Networks

Segmentation of Color Fundus Images of the Human Retina: Detection of the Optic Disc and the Vascular Tree Using Morphological Techniques

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

Assessment of Reliability of Hamilton-Tompkins Algorithm to ECG Parameter Detection

BIOAUTOMATION, 2009, 13 (2), 84-96

University of California Postprints

Comparison of ANN and Fuzzy logic based Bradycardia and Tachycardia Arrhythmia detection using ECG signal

Validating the Visual Saliency Model

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

Keywords Artificial Neural Networks (ANN), Echocardiogram, BPNN, RBFNN, Classification, survival Analysis.

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

Detection of Lung Cancer Using Backpropagation Neural Networks and Genetic Algorithm

IJRIM Volume 1, Issue 2 (June, 2011) (ISSN ) ECG FEATURE EXTRACTION FOR CLASSIFICATION OF ARRHYTHMIA. Abstract

keywords: sleep monitoring, assisted living, wireless body area network, heart rate variability,

A REVIEW ON CLASSIFICATION OF BREAST CANCER DETECTION USING COMBINATION OF THE FEATURE EXTRACTION MODELS. Aeronautical Engineering. Hyderabad. India.

SPPS: STACHOSTIC PREDICTION PATTERN CLASSIFICATION SET BASED MINING TECHNIQUES FOR ECG SIGNAL ANALYSIS

A RECOGNITION OF ECG ARRHYTHMIAS USING ARTIFICIAL NEURAL NETWORKS

ECG classification and abnormality detection using cascade forward neural network

Data Mining Tools used in Deep Brain Stimulation Analysis Results

Comparison of Feature Extraction Techniques: A Case Study on Myocardial Ischemic Beat Detection

Extraction of Unwanted Noise in Electrocardiogram (ECG) Signals Using Discrete Wavelet Transformation

Brain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine

TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING

BTL CardioPoint Relief & Waterfall. Relief & Waterfall. Abnormalities at first sight

A Review on Sleep Apnea Detection from ECG Signal

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Primary Level Classification of Brain Tumor using PCA and PNN

Type II Fuzzy Possibilistic C-Mean Clustering

Using Association Rule Mining to Discover Temporal Relations of Daily Activities

Delineation of QRS-complex, P and T-wave in 12-lead ECG

VLSI Implementation of the DWT based Arrhythmia Detection Architecture using Co- Simulation

Identification of Arrhythmia Classes Using Machine-Learning Techniques

Reveal Relationships in Categorical Data

Sparse Coding in Sparse Winner Networks

Classification of ECG Data for Predictive Analysis to Assist in Medical Decisions.

NeuroMem. RBF Decision Space Mapping

Artificial Neural Networks (Ref: Negnevitsky, M. Artificial Intelligence, Chapter 6)

Transcription:

Clinical Examples as Non-uniform Learning and Testing Sets Piotr Augustyniak AGH University of Science and Technology, 3 Mickiewicza Ave. 3-9 Krakow, Poland august@agh.edu.pl Abstract. Clinical examples are widely used as learning and testing sets for newly proposed artificial intelligence-based classifiers of signals and images in medicine. The results obtained from testing are usually taken as an estimate of the behavior of automatic recognition system in presence of unknown input in the future. This paper investigates and discusses the consequences of the non-uniform representation of the medical knowledge in such clinically-derived experimental sets. Additional challenges come from the nonlinear representation of the patient status in particular parameters domain and from the uncertainty of the reference provided usually by human experts. The presented solution consists of representation of all available cases in multidimensional diagnostic parameters or patient status spaces. This provides the option for independent linearization of selected dimensions. The recruitment to the learning set is then based on the case-to-case distance as selection criterion. In result, the classifier may be trained and tested in a more suitable way to cope with unpredicted patterns. 1 Introduction In medical research, the size of study cases population is an important, and usually the only considered factor of the result s reliability. However, even an intuitive approach says that two similar cases don t significantly enrich the learning or testing sets. Despite only a little influence the clinical researcher has on the available examples, each paper on clinical data-based research specifies only the cases count and neglects the features distribution therefore silently assuming it is gaussian [7] [3]. This observation and lack of justified guidelines for learning sets composition motivated us to the research on the intelligent recruitment. The presented method is proposed as an alternative for the random choice in the recruitment of cases to the learning set from all available medical examples. It is noteworthy that the human education, particularly in medicine, is also based on purposely preselected examples. Unlike the learning set, that determines the volume of competence of the AI classifier, the most natural method of recruitment test set members is the random choice. The paper is organized as follows: In section 2twoalternativerepresentations of medical cases are presented. The transformations between the representations L. Rutkowski et al. (Eds.): ICAISC, Part I, LNAI 6113, pp. 81 88,. c Springer-Verlag Berlin Heidelberg

82 P. Augustyniak and linearization of selected domains are also concerned in that section. Section 3 introduces the definition of the case-to-case distance and two methods of recruitment the cases as learning set members. Section 4 is dedicated to the description of conditions and results of tests of the basic QRS complex types recognition in the electrocardiogram with use of backpropagation neural network and both proposed learning set recruitment methods. 2 Management of Cases Representation 2.1 Parameter-Domain Representation of Cases Detailed description of the patient on the cell level is rarely practical in the clinic for the reasons of huge amount of data and lack of the organ s physiology description. Health records usually provide several organ-specific descriptors and principal global parameters describing the whole organism in aspects representing it as organ s environment influencing its functionality. Each subject S can be described by the set of diagnostic parameters S p = {p 1 p N }, considered as the projection of his physiological state on the modality-dependent N-dimensional state space S N. The projection is limited due to restrictions on the count N of values available for measurement and inaccurate due to measurement errors ɛ N and additive interferences δ N [8]. 2.2 Disease-Domain Representation of Cases The description of the disease D usually involves a set of symptoms D s = {s 1 s M }, being characteristic patterns of M selected parameters [9]. Their coincidence is defined as a set of conditions C D {M}, also called disease templates allowing the doctor to make evidence of certain pathology. Comparing to p 3 average absolute acceleration p 2 area to border length ratio p 1 length of the QRS Fig. 1. Parameter-domain representation of cases. Example dimensions are: length of the QRS, area to border length ratio and average absolute acceleration.

Clinical Examples as Non-uniform Learning and Testing Sets 83 s 3 ischemiae s betablocker medication 2 s 1 arrhythmia probability Fig. 2. Disease-domain representation of cases. Example dimensions are: arrhythmia probability, betablocker medication and ischemiae. the parameter-domain representation being an initial quantitative description of the subject, the disease-domain representation is a result of the interpretation process and the final outcome of the diagnostics determining the subject treatment. 2.3 Transformation of Cases Representation The medical diagnosis may be considered as matching of the parameter-domain and the disease-domain case descriptions. The subject S S is qualified to a certain category defined in the disease-domain space D M and described as having a disease D D by means of his diagnostic parameters S p best matching to recommended and fulfilling the essential criteria of the disease pattern D s (eqn. 1). Although the patient may meet the criteria C D {M} for several diseases D 1,D 2,D 3, only up to two, three of them, considered as most important are diagnosed and treated in medical practice. p (D) = S p,d s where m MD s C D {M} (1) The patient s status available in his health record in the parameter-domain representation may be transformed to the normalized diseases space in which the probability of each disease is represented independently (2). S N D M : p(d) =C D {M} 1 (2) This transformation is based on the quantitative measure of correlation between the subject s record and the disease template. The transformation is only partially mutually unambiguous, since - due to the data reduction - guessing the diagnostic parameters of a subject from the disease he or she has is hazardous. The transformation is performed by the human medic during the diagnosis process. If the available N parameters are not sufficient to separate two pathologies,

84 P. Augustyniak the representation of the subject may be completed by the complementary diagnosis yielding N1 new parameters and interpreted in an iterative way. 2.4 Domain Linearization of Cases Representation Representation of the subject s state in the multidimensional parameter space assumes the independence of any two particular variables. In spite this is not always fulfilled, such representation opens the opportunity for linearization of the dimensions [1], where the nonlinearity considerably influences the distance calculations. The parameter p n whose values have to be piecewise expanded or compressed in order to correctly represent the differences between particular diseases is transformed accordingly to (3): p n = f (p n) (3) where f is a nonlinear projection of the dimension p to p. The projection f is a piecewise continuous function defined in all the domain of the parameter p n in a heuristic way in order to provide equal separation of the normal and abnormal cases regardless the parameter s value. 3 Distance of the Case-Representation Space 3.1 Definition of the Case-to-Case Distance In order to consider distance-based similarity of case representations, the notion of distance has to be defined in parameter-domain and diseases-domain spaces. Provided all dimensions of the space are linear, the following Cartesian definition of the distance is applicable (4): ( ) d (p N,p K )= w 1 d N 1 d K 2 1 + + wm (d N m d K m) 2 (4) where w m is the m th weighting coefficient for each dimension in the disease state space. 3.2 Distance-Based Case Recruitment Aiming at the generation of possible rich and representative learning set from a given set of available cases, the selection of appropriate cases may follow one of the following paradigm: maximum hiperspace volume, or equidistant hiperspace support. First approach tends to maximize the volume of the competence hiperspace by selecting the cases most distant in the space as learning set members. This can be achieved by calculating first the gravity center of the cloud of all available

Clinical Examples as Non-uniform Learning and Testing Sets 8 medical cases. First candidate is recruited as the most peripheral case. Next candidates are cases most distant from all of the previously selected, therefore the learning set consists of most atypical cases. The procedure ends after having recruited a given number of cases, or when cases exceeding the given distance are no longer available. Second approach assumes the equal distance between the learning set samples. The procedure starts with the calculation of the distance between any two cases in the space. If the distance histogram is unimodal, the distance range is determined around its maximum population bin. First candidate is then recruited at random from cases belonging to the most typical distance bin. The recruitment of next candidates is based on how their distance to the already recruited case matches with the mid-range value. The procedure ends when the only remaining cases show out-of-range distance. 4 Test Conditions and Results Both presented recruitment methods were implemented in the Matlab environment. Based on objects description consisting of up to 2 parameters each, they are capable to select from the initial population a subset complying with given distance or population criteria. Despite the method is designed to provide purposely selected learning sets for a wide class of artificial intelligence algorithms, we tested it on a three-layers backpropagation neural network [6], [], [],[12] applied to heartbeat types (QRS) classification in the electrocardiogram (ECG). The ECG signal originated from the MIT-BIH Arrhythmia Database [4] and was sampled with the frequency of 36 Hz. Each QRS section was represented in normalized parameter- and diseasespaces. The versors of the parameter space were: length of the QRS, area to border length ratio, average absolute acceleration. The versors of the disease space were: arrhythmia probability, betablocker medication, ischemiae (insufficient oxygenation, ST segment changes). Each dimension was quantized to levels, therefore the input layer contains 6 neurons. The middle layer is composed of 16 neurons, and the output layer contains 4 neurons, accordingly to the recognition of four basic QRS morphologies: normal, supraventricular, ventricular and undetermined. From cases of heartbeats available from the database with medical annotations, cases were randomly selected as the test set, and other cases were recruited accordingly to the presented methods as the learning set. For the purpose of reference, the random recruitment was also used as an option.

86 P. Augustyniak The results classification accuracy for the parameter-space heartbeat representation are displayed in tab. 1 The results classification accuracy for the disease-space heartbeat representation are displayed in tab. 2. Table 1. Parameter-space heartbeat representation. Percentage of correct heartbeat classification for the same test set and different recruitment method for learning set cases. learning setrecruitment method random maximum volume equidistant support normal 93 97 98 supraventricular 63 93 9 ventricular 87 98 98 undetermined 1 78 71 Table 2. Disease-space heartbeat representation. Percentage of correct heartbeat classification for the same test set and different recruitment method for learning set cases. learning setrecruitment method random maximum volume equidistant support normal 78 8 88 supraventricular 6 9 91 ventricular 81 8 91 undetermined 3 77 74 Discussion The artificially prepared learning set (in case of both recruitment methods and independently in both representation domains) led to a considerably better result of the network learning, expressed by a better recognition result achieved in the test phase with use of the same randomly selected test set. When the method maximizing the volume of competence space was used, the recognition of undetermined beats was slightly better comparing to the competitors. This was caused by the representation of more atypical beats within the learning set. The features of these beats lie far from the gravity center of the cases cloud in the parameter space. On the contrary, the use of equidistant hiperspace support refines the allotment of beats into basic categories (normal/supraventricular/ventricular) at a price of few more atypical beats erroneously falling into the undetermined category. Such behavior is most expected in real electrocardiogram interpreters and the proposed regularization of the non-uniform representation of the medical knowledge has been proven as the efficient method for ameliorating the learning set adequacy. When the disease-space heartbeat representation is used, the percentage of correct heartbeat classification drops dramatically for two reasons: disease-domain representation is determined based on the electrocardiogram context wider than a single heartbeat,

Clinical Examples as Non-uniform Learning and Testing Sets 87 transformation of cases representation simplifies the cases description thus for the determination of the heartbeat types, the disease-domain is less representative than the parameter-domain. In fact the determination of heartbeat types based on the disease-domain representation is a reversal of the typical diagnostic order. The regular interpretation first calculates the parameters, then based on parameter values determines the beat types, which in turn and in the context of neighboring results are used for assignment the disease-domain representation [11]. The presented investigation supported by the example of application to the problem of heartbeat classification, well known in electrocardiology, demonstrates that the recruitment of learning set members determines the quality of AI-based recognition systems. The selection method should consider: possibly diverse examples spanning the hiperspace of competence which volume is maximal, possibly regular distribution of the samples of medical knowledge along the particular dimensions of that hiperspace. Acknowledgment Scientific work supported by the Polish State Committee for Scientific Research resources in years 9-12 as a research project No. N N18 426736. References 1. Aldroubi, A., Feichtinger, H.: Exact Iterative Reconstruction Algorithm for Multivariate Irregularly Sampled Functions in Spline-like Spaces: the L p Theory. Proc. Amer. Math. Soc. 126(9), 2677 2686 (1998) 2. Augustyniak, P.: Automatic Understanding of ECG Signal. In: Kopotek, A., Wierzchon, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining, pp. 91 97. Springer, Heidelberg () 3. Haussler, D.: Quantifying Inductive Bias: AI Learning Algorithms and Valiant s Learning Framework. Artificial Intelligence 36, 177 221 (1988) 4. Moody, G.B., Mark, R.G.: The MIT-BIH Arrhythmia Database on CD-ROM and Software for Use with it. In: Computers in Cardiology 199, pp. 18 188 (199). Osowski, S.: Neural Networks for Information Processing. WUT Publishing House, Warsaw () (in Polish) 6. Rutkowski, L., Tadeusiewicz, R. (eds.): Neural Networks and Soft Computing. Polish Neural Network Society () 7. Stanisz, A.: Accessible Course of the Statistics with STATISTICA PL and Examples from Medicine. StatSoft Poland, Krakow (6) (in Polish) 8. Straszecka, E., Straszecka, J.: Distance Based Classifiers and their Use to Analysis of Data Concerned Acute Coronary Syndromes. Image Processing & Communications 9(3-4), 3 69 (3) 9. Straszecka, E., Straszecka, J.: Interpretation of Medical Symptoms Using Fuzzy Focal Element. In: Kurzynski, M., et al. (eds.) Computer Recognition Systems. Springer, Heidelberg ()

88 P. Augustyniak. Tadeusiewicz, R.: Neural Networks. RM Academic Publishing House, Warsaw (1993) (in Polish) 11. Tadeusiewicz, R., Augustyniak, P.: Information Flow and Data Reduction in the ECG Interpretation Process. In: IEEE 27 Annual EMBS Conf., paper 88 () 12. Tadeusiewicz, R., Ogiela, L.: Selected cognitive categorization systems. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 8. LNCS (LNAI), vol. 97, pp. 1127 1136. Springer, Heidelberg (8)