Novel Speech Signal Enhancement Techniques for Tamil Speech Recognition using RLS Adaptive Filtering and Dual Tree Complex Wavelet Transform

Similar documents
CHAPTER 1 INTRODUCTION

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

Codebook driven short-term predictor parameter estimation for speech enhancement

Performance Comparison of Speech Enhancement Algorithms Using Different Parameters

FIR filter bank design for Audiogram Matching

SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS. Christina Leitner and Franz Pernkopf

Acoustic Signal Processing Based on Deep Neural Networks

Speech Enhancement Based on Deep Neural Networks

CURRENTLY, the most accurate method for evaluating

Speech Enhancement Using Deep Neural Network

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phase Components

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Extraction of Unwanted Noise in Electrocardiogram (ECG) Signals Using Discrete Wavelet Transformation

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

ADVANCES in NATURAL and APPLIED SCIENCES

Development of novel algorithm by combining Wavelet based Enhanced Canny edge Detection and Adaptive Filtering Method for Human Emotion Recognition

The importance of phase in speech enhancement

Speech recognition in noisy environments: A survey

EEL 6586, Project - Hearing Aids algorithms

Adaptation of Classification Model for Improving Speech Intelligibility in Noise

A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPLANTS USING KALMAN WITH DRNL AND SSB TECHNIQUE

Pattern Playback in the '90s

Computational Perception /785. Auditory Scene Analysis

Speech Enhancement, Human Auditory system, Digital hearing aid, Noise maskers, Tone maskers, Signal to Noise ratio, Mean Square Error

Implementation of Spectral Maxima Sound processing for cochlear. implants by using Bark scale Frequency band partition

An active unpleasantness control system for indoor noise based on auditory masking

HCS 7367 Speech Perception

Psychometric Properties of the Mean Opinion Scale

Voice Detection using Speech Energy Maximization and Silence Feature Normalization

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Pitch & Binaural listening

Parametric Optimization and Analysis of Adaptive Equalization Algorithms for Noisy Speech Signals

Speech Compression for Noise-Corrupted Thai Dialects

Sound Texture Classification Using Statistics from an Auditory Model

Fig. 1 High level block diagram of the binary mask algorithm.[1]

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

Speech quality evaluation of a sparse coding shrinkage noise reduction algorithm with normal hearing and hearing impaired listeners

HCS 7367 Speech Perception

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking

Near-End Perception Enhancement using Dominant Frequency Extraction

Extraction of Blood Vessels and Recognition of Bifurcation Points in Retinal Fundus Image

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

Automatic Live Monitoring of Communication Quality for Normal-Hearing and Hearing-Impaired Listeners

ReSound NoiseTracker II

Removal of Baseline wander and detection of QRS complex using wavelets

BINAURAL DICHOTIC PRESENTATION FOR MODERATE BILATERAL SENSORINEURAL HEARING-IMPAIRED

REVIEW ON ARRHYTHMIA DETECTION USING SIGNAL PROCESSING

Epileptic seizure detection using EEG signals by means of stationary wavelet transforms

NOISE REDUCTION ALGORITHMS FOR BETTER SPEECH UNDERSTANDING IN COCHLEAR IMPLANTATION- A SURVEY

Discrete Wavelet Transform-based Baseline Wandering Removal for High Resolution Electrocardiogram

International Journal of Scientific & Engineering Research, Volume 5, Issue 3, March ISSN

Discrete Signal Processing

PCA Enhanced Kalman Filter for ECG Denoising

FPGA IMPLEMENTATION OF COMB FILTER FOR AUDIBILITY ENHANCEMENT OF HEARING IMPAIRED

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

LIST OF FIGURES. Figure No. Title Page No. Fig. l. l Fig. l.2

Lecture 9: Speech Recognition: Front Ends

HOW TO USE THE SHURE MXA910 CEILING ARRAY MICROPHONE FOR VOICE LIFT

Speech (Sound) Processing

Digital. hearing instruments have burst on the

CHAPTER IV PREPROCESSING & FEATURE EXTRACTION IN ECG SIGNALS

Hybrid Masking Algorithm for Universal Hearing Aid System

Chapter 1. Introduction

The Use of a High Frequency Emphasis Microphone for Musicians Published on Monday, 09 February :50

Role of F0 differences in source segregation

Chapter 4. MATLAB Implementation and Performance Evaluation of Transform Domain Methods

Noise Cancellation using Adaptive Filters Algorithms

HybridMaskingAlgorithmforUniversalHearingAidSystem. Hybrid Masking Algorithm for Universal Hearing Aid System

Enhancement of Reverberant Speech Using LP Residual Signal

SPEECH recordings taken from realistic environments typically

EEG Signal Description with Spectral-Envelope- Based Speech Recognition Features for Detection of Neonatal Seizures

Heart Murmur Recognition Based on Hidden Markov Model

Fuzzy Based Early Detection of Myocardial Ischemia Using Wavelets

Masking release and the contribution of obstruent consonants on speech recognition in noise by cochlear implant users

MRI Image Processing Operations for Brain Tumor Detection

MULTI-MODAL FETAL ECG EXTRACTION USING MULTI-KERNEL GAUSSIAN PROCESSES. Bharathi Surisetti and Richard M. Dansereau

Improved Intelligent Classification Technique Based On Support Vector Machines

Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone

Removing ECG Artifact from the Surface EMG Signal Using Adaptive Subtraction Technique

PERFORMANCE CALCULATION OF WAVELET TRANSFORMS FOR REMOVAL OF BASELINE WANDER FROM ECG

Adaptive Feedback Cancellation for the RHYTHM R3920 from ON Semiconductor

Evidence base for hearing aid features:

The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet

Digital hearing aids are still

IEEE SIGNAL PROCESSING LETTERS, VOL. 13, NO. 3, MARCH A Self-Structured Adaptive Decision Feedback Equalizer

Perceptual Effects of Nasal Cue Modification

TOWARDS OBJECTIVE MEASURES OF SPEECH INTELLIGIBILITY FOR COCHLEAR IMPLANT USERS IN REVERBERANT ENVIRONMENTS

Biomedical. Measurement and Design ELEC4623. Lectures 15 and 16 Statistical Algorithms for Automated Signal Detection and Analysis

EasyChair Preprint. Comparison between Epsilon Normalized Least means Square (-NLMS) and Recursive Least Squares (RLS) Adaptive Algorithms

Best Practice Protocols

Detection of pulmonary abnormalities using Multi scale products and ARMA modelling

Gabor Wavelet Approach for Automatic Brain Tumor Detection

GENERALIZATION OF SUPERVISED LEARNING FOR BINARY MASK ESTIMATION

Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation

International Journal for Science and Emerging

Combination Method for Powerline Interference Reduction in ECG

Automatic Detection of Heart Disease Using Discreet Wavelet Transform and Artificial Neural Network

Modulation and Top-Down Processing in Audition

Transcription:

Web Site: wwwijettcsorg Email: editor@ijettcsorg Novel Speech Signal Enhancement Techniques for Tamil Speech Recognition using RLS Adaptive Filtering and Dual Tree Complex Wavelet Transform VimalaC 1, RadhaV 2 2 Professor and Head, Department of Computer Science, Avinashilingam Institute of Home Science and Higher Education for Women, Coimbatore 641043, Tamil Nadu, India Abstract A good speech signal enhancement technique must improve both quality and intelligibility of the enhanced signals for all types of environment conditions However, the speech signal enhancement technique can reduce noise, but introduce its own distortion to the enhanced signals Hence, it may or may not improve the quality and the intelligibility of the enhanced speech signals The main objective of this paper is to propose suitable speech signal enhancement techniques that can improve both quality and intelligibility of the enhanced signals In this research work, the combinational speech signal enhancement techniques are proposed using Dual Tree Complex Wavelet () Transform and Recursive Least Squares (RLS) adaptive filtering Three types of techniques are introduced by using the combination of and RLS adaptive filtering The performances of the developed techniques are evaluated based on both subjective and objective speech quality measures The experimental results prove that the proposed methods have provided better results in speech noise cancellation Excellent results are achieved in improving the quality and intelligibility of the enhanced speech signal Keywords: Speech signal enhancement, Dual Tree Complex Wavelet () Transform, RLS adaptive algorithm, Ideal Binary Mask (IBM), Phase Spectrum Compensation (PSC) 1 INTRODUCTION In real time environment, the speech signals are corrupted by several forms of noise such as competing speakers, background noise, channel distortion and room reverberation etc The presence of background noise in speech significantly reduces its quality and intelligibility of the signal Therefore, enhancing the noisy speech signal is necessary for improving the perceptual quality Speech signal enhancement is applied in many applications like telecommunications, speech and speaker recognition etc [1] Particularly, there is a huge need for speech signal enhancement in speech recognition system This is because, speech recognition application may be developed in one environment and it can be operated in some other environment In such cases, the mismatch between the training and testing conditions will be increased and the recognition performance will be decreased Several techniques have been proposed for speech signal enhancement such as spectral subtraction, adaptive filtering, Kalman filtering, wavelet filtering and Ideal Binary Mask (IBM) etc The main objective of this paper is to implement efficient speech signal enhancement techniques which are suitable for different noisy conditions The potent metrics of the transform has been considered and it is combined with RLS adaptive filtering, IBM and Power Spectrum Compensation (PSC) methods Four types of noise (White, Babble, Mall and Car) and five types of db levels (-10dB, -5dB, 0dB, 5dB and 10dB) are involved in the proposed work and their performances are evaluated both subjectively and objectively In this paper, apart from noise reduction, the improvement in the quality and intelligibility of the enhanced signal has been focused more The proposed techniques have improved both intelligibility and the quality of the enhanced signal The paper is organized as follows Section 2 discusses about the related works on wavelet transform Section 3 explains the RLS adaptive filtering technique and section 4 discusses about the proposed technique using transform In section 5, the experimental results are presented and the performance metrics used for the proposed work is explained in section 6 The overall discussions are summarized in section 7 and the conclusion and future work is given in section 8 2 RELATED WORKS Reshad Hosseini and Mansur Vafadust, (2008) have developed an almost perfect re-construction filter bank for non-redundant, approximately shift-invariant, complex wavelet transforms [2] The proposed novel filter bank with Hilbert pairs wavelet filters does not have serious distributed bumps on the wrong side of power spectrum The redundancy of an original signal is significantly Volume 6, Issue 1, January February 2017 Page 150

Web Site: wwwijettcsorg Email: editor@ijettcsorg reduced and the properties of proposed filter bank can be exploited in different signal processing applications Slavy, G Mihov et al (2009) performed a de-noising of noisy speech signals by using Wavelet Transform [3] The use of wavelet transform in de-noising and the speech signals contaminated with common noises is investigated The authors state that, the wavelet-based de-noising with either hard or soft thresholding was found to be the most effective technique for many practical problems The experimental results with large database of reference speech signals contaminated with various noises in several Signal-to-Noise Ratio () db conditions are presented The authors also insist that, the power spectrum estimation using a wavelet based de-noising may be applied as an important approach for better speech signal enhancement The research work will be extended to be applied for the practical research on speech signal enhancement for hearing-aid devices Rajeev Aggarwal et al (2011) have implemented a Discrete Wavelet Transform (DWT) based algorithm using both hard and soft thresholding for denoising [4] Experimental analyzes is performed for noisy speech signals corrupted by babble noise at 0dB, 5dB, 10dB and 15dB levels Output and MSE are calculated and compared using both types of thresholding methods Experiments show that soft thresholding method was found to be better than a hard thresholding method for all the input db levels involved in the work The hard thresholding method has extended a 2179 db improvement while soft thresholding has achieved a maximum of 3516 db improvement in output Jai Shankar, B and Duraiswamy, K (2012), have proposed a de-noising technique based on wavelet transformation [5] The noise cancellation method is improved by a process of grouping closer blocks All the significant information resides in each set of blocks are utilized and the vital features are extracted for further process All the blocks are filtered and restored in their original positions, where the overlapping is applied for grouped blocks The experimental results have proved that the developed technique was found to be better in terms of both and signal quality Moreover, the technique can be easily modified and used for various other audio signal processing applications D Yugandhar, SK Nayak (2016) have proposed a nature inspired population based speech enhancement technique to find the dynamic threshold value using Teaching- Learning Based Optimization (TLBO) algorithm by using shift invariant property of T [6] The performance of the proposed methods is better in terms of PESQ and P Pengfei Sun and Jun Qin (2017) have proposed a twostage Dual Tree Complex Wavelet Packet Transform (PT) based speech enhancement algorithm, in which a Speech Presence Probability (SPP) estimator and a generalized Minimum Mean Squared Error (MMSE) estimator are developed [7] In their work, to overcome the drawback of signal distortions caused by down sampling of Wavelet Packet Transform (WPT), a twostage analytic decomposition concatenating Undecimated Wavelet Packet Transform (UWPT) and decimated WPT is employed The process of RLS adaptive filtering technique is explained in the next section 3 RLS ADAPTIVE FILTERING FOR SPEECH SIGNAL ENHANCEMENT RLS adaptive algorithm is a recursive implementation of the Wiener filter, in which the input and output signals are related by the regression model RLS has the potential to automatically adjust the coefficients of a filter, even though the statistic measures of the input signals are not present [8] In RLS algorithm, filter tap weight vector is updated by T w( w ( n 1) k( e 1( (1) n The steps involved in RLS adaptive algorithm is given in the following algorithm and the variables used in the algorithm is illustrated in Table 1 Algorithm of RLS adaptive filtering Step 1: Initialize the algorithm by setting wˆ(0) 0, P(0) 1 Small positive Large positive I, and constant constant low When the input data characteristics are changed, the filter adapts to the new environment by generating a new set of coefficients for the new data [9] Here, λ-1 denotes the reciprocal of the exponential weighting factor The main advantage of RLS adaptive filtering is, it attempts to reduce the estimated error e( Therefore, output from the adaptive filter matches closely the desired signal d( The perfect adaptation can be achieved, when e( reaches for for high Step 2: For each instant time, n=1,2,, compute 1 P( n 1) u( k( 1 H 1 u ( P( n 1) u( y( wˆ H ( n 1) u( e( d( y( wˆ ( wˆ ( n 1) k( e ( 1 1 P( P( n 1) k( u * H ( P( n 1) Volume 6, Issue 1, January February 2017 Page 151

Web Site: wwwijettcsorg Email: editor@ijettcsorg zero In this work, the resultant enhanced signal y( produced by RLS filtering was found to be better in terms of quality and intelligibility Variabl e N u( P( k( y( e( d( Λ Table 1: Variables used in RLS Algorithm Description Current algorithm iteration Buffered input samples at step n Inverse correlation matrix at step n Gain vector at step n Filtered output at step n Estimation error at step n Desired response at step n Exponential memory weighting factor VimalaC and RadhaV have done a performance evaluation of the three adaptive filtering techniques, namely, Least Mean Squares (LMS), Normalized Least Mean Squares (NLMS) and RLS adaptive filtering techniques These techniques are evaluated for Noisy Tamil Speech Recognition based on three performance metrics, namely,, Loss and MSE [10] It is observed from the experiments that, RLS technique provides faster convergence and smaller error, but it increases the complexity when compared with LMS sand NLMS techniques Based on the significant result achieved by the RLS adaptive filtering, the combinational techniques are proposed by using the transform based reconstruction methodology The subsequent sections briefly explain the same in detail 4 PROPOSED TECHNIQUE USING RLS FILTERING AND TRANSFORM In signal processing, quality represents the naturalness of speech, and the intelligibility represents the understandability of text information present in the signal However, removing noise and improving the perceptual quality and intelligibility of a speech signal, without altering the signal quality, is a crucial job This is an important problem in any speech enhancement technique The main objective of this research work is, to develop efficient speech enhancement techniques which can improve both quality and intelligibility of the enhanced speech signals To meet this objective, two significant improvements are done with the existing RLS adaptive algorithm Suitable square root correlation matrix and forgetting factor value is identified which can be applied for all type of noises and db levels, and The reconstruction methodology is applied to the resultant RLS signal using transform, to produce the perfect enhanced signal as like the original input signal Various initial square root correlation matrix values and RLS forgetting factor values have been evaluated It is observed from the experiments that different forgetting factor value and square root correlation matrix need to be assigned for positive and negative db values In such cases, these values to be assigned and their performances should be evaluated based on trial and error method It is time consuming and not suitable for applying different noisy conditions Therefore, the above mentioned two parameters are fine-tuned, to provide optimal values which are more suitable for both positive and negative db levels The experiments are carried out under Matlab environment and the desired values for the above parameters are discovered It is confirmed from the experimental outcome that, better results are obtained when the initial square root correlation matrix values are assigned as 2*eye(10) and RLS forgetting factor value is set to 1 After fine tuning these two parameters, the reconstruction methodology is implemented using transform The advantages of using transform and the steps involved in the proposed technique are explained in the next section 41 Advantages of using The standard form of Discrete Wavelet Transform (DWT) is shift-variant, which is undesirable and does not provide perfect speech signal enhancement To perform speech enhancement for severe noisy conditions, the simple DWT cannot produce the expected outcome In such cases, there is a need for other alternative technique which can perform well under different noisy conditions To overcome the shift-variance problem, noise reductions based on shift-invariant wavelet transforms, have been introduced by using transform [9] It consists of two specifically designed DWTs, which are applied in parallel to the same input data and it is shown in Figure 1 [10] transform is more attractive than the single DWT in terms of computational complexity, because it is equivalent to the two standard DWTs [13] Figure 1 Structure of Transform Volume 6, Issue 1, January February 2017 Page 152

Web Site: wwwijettcsorg Email: editor@ijettcsorg The sub-band signals of these two DWTs can be interpreted as the real and imaginary parts of a complex wavelet transform, which is nearly shift-invariant In transform, the real and imaginary coefficients are calculated in tree a and tree b respectively [14] In this research work, the enhanced signal produced by RLS adaptive filtering is reconstructed using transform Figure 2 shows the proposed speech enhancement technique using RLS- Transform and its steps are briefly given in the following algorithm Steps involved in Proposed RLS- Transform Step 1: Get the noisy speech signal as an input, Step 2: Initialize RLS filtering, Step 3: Fine tune the Initial square root correlation matrix inverse, Step 4: Set RLS forgetting factor value to 1, Step 5: Perform RLS filtering, Step 6: Pass the resultant signal as an input to the transform, Step 7: Initialize the transform for performing reconstruction methodology, Step 8: Calculate the complex transform of a signal using two separate DWT decompositions (tree a and tree b), Step 9: Extract the real coefficients using tree a, Step 10: Extract the imaginary coefficients using tree b, Step 11: Approximate shift-invariance, and Figure 2 Proposed speech enhancement technique using RLS- Transform As given in the algorithm, initially the noisy input signal is passed to the adaptive filtering Later the parameters are fine tuned and the suitable values are assigned for speech enhancement Subsequently, the output signal acquired from RLS filtering is further given for transform to perform reconstruction methodology By using transform, additional information about the noisy input signal can be extracted, because it involves both real and imaginary coefficients Therefore, the perfect reconstruction and better signal enhancement is achieved The resultant signal has produced better signal enhancement and it is very much close to the original signal The experimental results indicate that, the RLS adaptive filtering with transform has performed better when compared with the existing RLS adaptive filtering The experimental result achieved by the developed technique is presented in the next section 5 EXPERIMENTAL RESULTS The perception of a speech signal is usually measured in terms of its quality and intelligibility Quality is the subjective measure which reflects on individual preferences of listeners Intelligibility is an objective measure which predicts the percentage of words that can be correctly identified by the listeners It is noticed from the experimental results that, the resultant signals of RLS was found to be better in terms of both quality and intelligibility Since, the perfect reconstruction is done, the enhanced signals are found to be more clear and natural The experiments are done with 10 Tamil Spoken Digits uttered 10 times which are corrupted by four types of noise (White, Babble, Mall and Car noise) and five types of Signal-to-Noise Ratio () db levels varying from -10dB to 10dB The total dataset size is 2000 (10*10*4*5) Since the noisy dataset is not available for Tamil language, it is created artificially by adding noise from NOIZEUS database In noisy environment, when the db level is less than 20 db, the speech recognition will become a difficult problem In this research work, even more critical situations are handled Figure 3 shows the waveform representation of the enhanced signals corrupted by babble noise using proposed RLS- transform and the corresponding spectrograms are presented in Figure 4 Volume 6, Issue 1, January February 2017 Page 153

Web Site: wwwijettcsorg Email: editor@ijettcsorg Figure 3 Waveform Representation of the Enhanced Signal using RLS- Transform Technique As discussed in the previous section, next to RLS, the IBM and PSC methods have produced better results Based on the improvements achieved with the combination of RLS and transform, the IBM and PSC methods are also considered for improving their performance Therefore, these two methods are also improved by combining with Figure 4 Spectrogram Representation of the Enhanced Signal using RLS- Transform Technique the RLS- transform To accomplish this task, the filtered signal using IBM and PSC methods are passed to the RLS and transform The performance improvement of the IBM and PSC methods are assessed in two ways: By applying the reconstruction Volume 6, Issue 1, January February 2017 Page 154

Web Site: wwwijettcsorg Email: editor@ijettcsorg methodology alone (IBM-), (PSC-), and By applying both and RLS adaptive Filtering technique (IBM-RLS-), (PSC-RLS-) The overall approach of the proposed work is given in Figure 5 It is observed from the experimental outcomes that, there was a reasonable improvement achieved by using transform for both IBM and PSC methods However, there was a significant performance improvement obtained while using both RLS and transform rather than using transform alone Particularly, the PESQ, MOS values has been increased and the WSS and MSE values have been reduced extensively Performance evaluation of RLS-, IBM- RLS- and PSC-RLS- techniques based on speech signal quality measures are discussed below 6 Performance Evaluations based on Speech Signal Quality Measures The developed speech signal enhancement techniques are evaluated by using both subjective and objective speech quality measures In this work, six types of objective quality measures and one subjective quality measure is considered 61 Objective Speech Quality Measures Objective metrics are evaluated, based on the mathematical measures The objective quality measures used in this work are as follows: Weighted Spectral Slope (WSS) Segmental (Seg), Output, and Mean Squared Error (MSE) 611 Perceptual Evaluation of Speech Quality (PESQ) PESQ is the most sophisticated and accurate speech signal quality measure It is recommended by ITU-T for speech quality assessment of 32 khz narrow-band handset for telephony and speech codec applications To compute PESQ, the difference between the original and the enhanced signals are computed and averaged over time The prediction of subjective quality rating between 10 and 45 will be produced The higher value represents the better quality of the enhanced signal 612 Log Likelihood Ratio (LLR) LLR is computed with respect to the difference between the target and the reference signals in frame-by-frame analysis LLR computation requires the corresponding original speech signal as the reference signal for comparing with the target signal and it is given by a R a LLR ( a p, a c ) log a a where, c is the LPC vector of the original speech frame, ap is the LPC vector of the enhanced speech frame, Rc is the autocorrelation matrix of the original speech signal p c R c c a T p T c (2) Perceptual Evaluation of Speech Quality (PESQ), Log Likelihood Ratio (LLR), 613 Weighted Spectral Slope (WSS) WSS is measured based on the comparison of the smoothed spectra from the clean and distorted speech samples The spectral slope is obtained as the difference Figure 5 Overall Approach of the Proposed Work between the adjacent spectral magnitudes in decibels WSS computation is given by Volume 6, Issue 1, January February 2017 Page 155

Web Site: wwwijettcsorg Email: editor@ijettcsorg where, W WSS (j,m) are the weights, K=25, M is the number of data segments, and S c (j,m) and S p (j,m) are the spectral slopes for the j th frequency band of the clean and enhanced speech signals, respectively 614 Segmental (Seg) Seg represents the average measurements of over short good frames The Seg computation is given by where, x( is the input signal, xˆ( is the processed enhanced signal, N is the frame length and M is the number of frames in the signal 615 Output is defined as the power ratio between the clean signal and the background noise The can be computed for both input and output signals is defined by 10 log 10 n (5) where, P s and P n represents the average power of speech signal and noisy signal respectively An output represents the relationship between the strength of the original and the degraded speech signal expressed in decibels and it is computed after applying the speech signal enhancement techniques Ideally, the greater indicates that the speech is stronger than the noise An efficient technique should improve the value of the output for the enhanced signal 616 Mean Squared Error (MSE) The MSE measure is defined by MSE 1 M 1 1 N l 0 k 0 MN where, is the short time MSS of the clean signal, is the estimated MSS, N is the total number of frequency bins The small values of MSE show the better estimate of the true MSS P P s 2 2 2 X ( l) Xˆ ( l) k k (3) (4) (6) 62 Subjective Speech Quality Measure Subjective quality evaluations are performed by involving a group of listeners to measure the quality of the enhanced speech The process of performing MOS is described below Mean Opinion Score (MOS) MOS predicts the overall quality of an enhanced signal, based on human listening test In this work, instead of using a regular MOS, the composite objective measures introduced by Yang Lu and Philipos, C Loizou (2008) is implemented [15] The authors have derived new accurate measures from the basic objective measures, which are obtained by using multiple linear regression analysis and nonlinear techniques It is time consuming and cost effective but provides more accurate estimate of the speech quality, so it is considered in this research work Separate quality ratings for both signal and background distortions are used and it is shown in Table2 To calculate the MOS, the listeners have to rate the particular enhanced speech signal, based on the overall quality The overall quality is measured by calculating the mean value of signal and background distortions (1= bad, 2=poor, 3= fair, 4= good and 5= excellent) In this research work, the MOS is calculated by performing listening test from 20 different speakers (10 males and 10 females) The listeners were asked to rate the speech sample under one of the five signal quality categories Table 2: Signal and Background Distortion Scale Rating Rating Signal Distortion Scale 5 Purely Natural, No degradation 4 Fairly Natural, Slight degradation 3 Somewhat natural, Somewhat degraded 2 Fairly unnatural, Fairly degraded 1 Quite unnatural, Highly degraded Background Distortion Scale Not perceptible Somewhat noticeable Noticeable but not intrusive Fairly Noticeable, Somewhat Intrusive Quite Noticeable, Highly intrusive The experimental results obtained by the adopted techniques and the performance evaluations are presented in the next section Tables 3,4,5 and 6 illustrate the performance evaluation of the proposed speech signal enhancement techniques for white, babble, mall and car noise respectively (for db types -10 db,-5 db, 0dB, 5 db and 10 db) Volume 6, Issue 1, January February 2017 Page 156

Web Site: wwwijettcsorg Email: editor@ijettcsorg Table 3: Performance Evaluations of the Proposed Speech Signal Enhancement Techniques for White Noise db Types -10 db -5 db 0 db 5 db 10 db Metrics RLS RLS- IBM IBM- RLS-IBM- PSC PSC- RLS-PSC- PESQ 348 403 232 238 394 117 119 328 LLR 086 039 042 309 040 241 199 124 WSS 941 120 6388 5753 048 14384 14363 468 Seg 654 1085-185 -197 178-011 052-060 Output 473 938-198 -219-055 016 077 180 MSE 014 008 031 030 025 024 021 019 MOS 389 463 292 153 455-043 053 357 PESQ 371 431 250 247 425 156 158 317 LLR 052 038 046 307 028 192 161 126 WSS 492 055 4674 5008 022 10255 10179 534 Seg 1131 1765-182 -199 263 166 215-056 Output 883 1292-203 -223-052 189 238 262 MSE 009 005 031 030 025 020 018 017 MOS 428 487 317 166 487 074 133 346 PESQ 386 440 262 256 434 198 200 306 LLR 029 036 045 299 014 151 124 126 WSS 185 017 4272 4746 015 6622 6621 622 Seg 1602 2242-186 -192 329 325 358-052 Output 1354 1471-208 -217-048 342 378 323 MSE 005 004 031 030 025 016 015 016 MOS 454 495 330 179 502 182 210 337 PESQ 400 445 155 265 431 249 255 298 LLR 012 035 077 282 010 119 092 123 WSS 064 006 11147 4211 022 4294 4241 637 Seg 2003 2551-092 -195 364 417 441-048 Output 1784 1562-093 -220-043 425 456 353 MSE 003 004 027 030 025 015 014 016 MOS 475 499 164 198 501 279 288 332 PESQ 428 449 071 268 301 298 427 291 LLR 004 035 187 271 068 093 010 127 WSS 027 003 22667 3958 2855 2986 024 708 Seg 2307 2751-006 -188 482 461 380-040 Output 2119 1613-006 -214 492 463-036 366 MSE 002 004 024 030 013 014 024 015 MOS 482 493-086 208 347 360 498 324 Volume 6, Issue 1, January February 2017 Page 157

Web Site: wwwijettcsorg Email: editor@ijettcsorg Table 4: Performance Evaluations of the Proposed Speech Signal Enhancement Techniques for Babble Noise db Types -10 db -5 db 0 db 5 db 10 db Metrics RLS RLS- IBM IBM- RLS-IBM- PSC PSC- RLS-PSC- PESQ 350 415 214 240 413 071 068 334 LLR 029 040 037 295 039 181 171 130 WSS 1209 063 7104 6433 085 16865 17215 866 Seg 881 1463-166 -158 264-099 -096-025 Output 535 1230-178 -177-023 -099-098 265 MSE 013 006 030 029 024 027 026 017 MOS 418 473 276 157 471-052 006 356 PESQ 371 427 239 239 423 086 082 328 LLR 010 038 040 285 023 147 140 124 WSS 589 017 6180 6229 028 14374 14592 785 Seg 1301 2494-152 -167 301 073 077-026 Output 910 1581-169 -189-024 088 091 302 MSE 009 004 030 029 024 022 021 017 MOS 449 484 301 163 489 015 052 354 PESQ 396 435 251 250 429 122 119 323 LLR 005 036 040 280 010 119 110 125 WSS 285 007 5675 5778 017 10580 10641 734 Seg 1719 2816-152 -155 304 215 220-027 Output 1351 1642-170 -177-025 244 250 304 MSE 005 004 030 029 024 018 018 016 MOS 474 491 314 177 499 107 124 350 PESQ 418 442 157 255 431 183 180 308 LLR 003 036 081 273 007 092 084 125 WSS 122 003 12344 5388 013 6726 6799 778 Seg 2114 2931-105 -160 336 342 351-031 Output 1782 1660-115 -181-028 365 379 332 MSE 003 003 028 029 024 016 015 016 MOS 493 497 152 187 502 221 213 338 PESQ 433 448 081 263 430 234 231 304 LLR 002 035 158 267 008 078 069 124 WSS 044 002 23994 4549 014 4356 4276 733 Seg 2407 3021-005 -179 367 429 443-031 Output 2123 1667-003 -202-029 440 461 358 MSE 002 003 024 029 024 015 014 015 MOS 496 499-066 203 501 302 280 336 Volume 6, Issue 1, January February 2017 Page 158

Web Site: wwwijettcsorg Email: editor@ijettcsorg Table 5: Performance Evaluations of the Proposed Speech Signal Enhancement Techniques for Mall Noise db Types -10 db -5 db 0 db 5 db 10 db Metrics RLS RLS- IBM IBM- RLS-IBM- PSC PSC- RLS-PSC- PESQ 326 429 226 240 427 089 088 329 LLR 037 060 036 291 059 213 181 128 WSS 1423 077 6656 5764 118 18317 18370 747 Seg 447 1041-204 -178 226-209 -165-020 Output 303 810-215 -204-017 -215-173 221 MSE 017 009 031 030 024 031 029 018 MOS 393 473 290 163 472-078 009 353 PESQ 359 421 261 251 407 113 112 334 LLR 022 048 040 292 048 173 138 137 WSS 662 037 5174 5340 140 12422 12595 808 Seg 1134 1863-159 -187 252 129 144-017 Output 722 1409-176 -212-019 127 147 249 MSE 011 005 030 030 024 021 020 018 MOS 433 473 327 175 462 033 091 353 PESQ 418 444 263 261 428 128 126 320 LLR 009 040 039 283 016 143 120 130 WSS 231 004 4323 4287 016 10477 10606 710 Seg 1789 2584-179 -189 351 229 249-027 Output 1530 1611-203 -218-026 242 264 336 MSE 004 004 031 030 024 018 017 016 MOS 489 496 336 195 496 091 125 346 PESQ 414 438 155 266 424 207 209 326 LLR 003 038 085 278 012 101 079 129 WSS 151 004 12552 4322 046 5696 5821 767 Seg 2021 2781-101 -183 314 367 378-022 Output 1614 1658-108 -209-022 369 386 308 MSE 004 003 028 030 024 016 015 016 MOS 490 492 145 201 494 245 247 350 PESQ 433 444 070 265 429 257 253 317 LLR 002 037 186 270 008 079 062 130 WSS 074 002 23157 4137 033 4037 3992 729 Seg 2335 2941-012 -190 345 428 441-029 Output 2010 1679-007 -215-028 427 449 335 MSE 002 003 025 030 024 015 014 016 MOS 497 498-090 206 500 325 304 343 Volume 6, Issue 1, January February 2017 Page 159

Web Site: wwwijettcsorg Email: editor@ijettcsorg Table 6: Performance Evaluations of the Proposed Speech Signal Enhancement Techniques for Car Noise db Types -10 db -5 db 0 db 5 db 10 db Metrics RLS RLS- IBM IBM- RLS-IBM- PSC PSC- RLS-PSC- PESQ 361 374 217 234 368 163 163 293 LLR 005 028 038 277 040 072 062 137 WSS 808 112 6865 5613 160 15395 15391 883 Seg 787 926-159 -171 176-046 -057-035 Output 405 868-175 -190-031 -031-043 180 MSE 015 009 030 029 024 025 025 019 MOS 441 445 279 167 434 143 151 319 PESQ 384 388 246 249 388 233 233 299 LLR 004 032 035 267 022 070 056 132 WSS 426 027 5115 5025 050 9225 9143 824 Seg 1502 1753-161 -171 292 175 172-035 Output 852 1406-178 -192-032 192 191 293 MSE 009 005 030 029 024 019 019 017 MOS 463 456 319 187 460 265 254 327 PESQ 404 416 257 258 385 280 277 302 LLR 002 034 034 260 014 071 055 128 WSS 207 009 4661 4648 027 6356 6350 832 Seg 2028 2493-161 -171 355 337 343-040 Output 1329 1607-178 -192-037 355 366 350 MSE 005 004 030 029 024 016 015 016 MOS 482 477 333 202 462 332 310 332 PESQ 418 434 129 261 384 300 293 304 LLR 002 034 141 256 011 072 055 126 WSS 091 004 13082 4374 022 4569 4585 823 Seg 2329 2834-129 -174 377 424 436-043 Output 1761 1652-146 -195-040 436 455 366 MSE 003 003 029 029 024 015 014 015 MOS 494 491 075 208 463 365 335 334 PESQ 431 442 038 262 383 314 307 299 LLR 002 034 213 254 011 073 056 126 WSS 042 002 22113 4194 020 3431 3390 752 Seg 2499 2970-005 -186 386 464 481-042 Output 2101 1665-002 -207-039 470 494 373 MSE 002 003 024 030 024 014 013 015 MOS 505 498-130 211 462 387 354 331 Volume 6, Issue 1, January February 2017 Page 160

Web Site: wwwijettcsorg Email: editor@ijettcsorg 7 DISCUSSIONS The experimental results have proved that the RLS- has produced maximum PESQ, MOS, Seg, LLR and Output Moreover, the proposed technique has reduced WSS, and MSE values when compared to the existing RLS adaptive filtering technique The IBM and PSC techniques also provided more significant performance improvements in terms of PESQ, MOS, Seg and Output values Moreover, very good results are achieved in reducing WSS and MSE also To make clear idea about the improvements achieved by the proposed techniques (shown in Tables 3,4,5 and 6), Table 7 illustrates the difference obtained by the proposed techniques based on both subjective and objective speech quality measures The average values of improvement obtained for five types of db levels for four type of noise are presented in the Table The highest difference achieved by the proposed techniques for each speech quality measure is highlighted It is evident from the above Table that, the maximum difference of 359 and 202 are achieved for PESQ and LLR respectively The Seg value has been increased up to 1193, output value has been increased up to 695 and the MOS value has been improved up to 59 Above all, the WSS value has been reduced from 23157 to 333 Therefore, it is clear from the above results that, the RLS-, IBM-RLS- and PSC-RLS- have obtained very good results These techniques were found to be better for speech signal enhancement and have improved the quality and intelligibility of the enhanced signals 8 CONCLUSION The main objective of this research work is to implement efficient speech signal enhancement techniques which are more suitable for different noisy conditions In this research work, apart from noise reduction, the improvement in the quality and intelligibility of an enhanced signal is more concentrated Three types of speech signal enhancement techniques were introduced by using the combination of RLS adaptive filtering and transform The proposed techniques are evaluated based on subjective and objective speech quality measures All the three techniques developed in this paper were found to be good and has achieved better speech signal enhancement The proposed reconstruction methodology has significant improvement in terms of improving both speech quality and intelligibility of the enhanced signals These three methods can be used for speech signal enhancement or they can be applied as a front-end processor for noisy speech recognition The future work is to evaluate the proposed technique for noisy Tamil speech recognition References [1] Lawrence R Rabiner, B H Juang, and Yegnanarayana, Fundamentals of Speech Recognition, Pearson Education India, 2008, ISBN: 9788-1775- 85605 [2] Reshad Hosseini and Mansur Vafadust, Almost Perfect Reconstruction Filter Bank for Nonredundant, Approximately Shift-Invariant, Complex Wavelet Transforms, Journal of Wavelet Theory and Applications, Vol 2(1), pp 1 14, 2008, ISSN: 0973-6336 [3] Slavy G Mihov, Ratcho M Ivanov and Angel N Popov, Denoising Speech Signals by Wavelet Transform, Annual Journal of Electronics, 2009, ISSN: 1313-1842 [4] Rajeev Aggarwal, Sanjay Rathore, Jai Karan Singh, Mukesh Tiwari, Vijay Kumar Gupta and Anubhuti Khare, Noise Reduction of Speech Signal using Wavelet Transform with Modified Universal Threshold, International Journal of Computer Applications, Vol 20(5), pp 14-19, 2011, ISSN: 0975 8887 [5] B Jai Shankar and K Duraiswamy, Audio Denoising using Wavelet Transform, International Journal of Advances in Engineering & Technology (IJAET), Vol 2(1), pp 419-425, 2012, ISSN: 2231-1963 [6] D Yugandhar, SK Nayak, A Heuristic Speech De-noising with the aid of Dual Tree Complex Wavelet Transform using Teaching-Learning Based Optimization, Vol 8(5), pp 1967-1980, 0975-4024 (Online) 2319-8613 (Print) [7] Pengfei Sun and Jun Qin, Speech Enhancement via Two-Stage Dual Tree Complex Wavelet Packet Transform with a Speech Presence Probability Estimator, The Journal of the Acoustical Society of America, 141, 808 (2017) [8] https://wwwmathworkscom/help/dsp/ug/rlsadaptive-filtershtml [9] http://wwwitudk/stud/speciale/segmentering/ Matlab6p5/help/toolbox/filterdesign/adaptiv2html [10] C Vimala and V Radha, Optimal Adaptive Filtering Technique for Tamil Speech Enhancement, International Journal of Computer Applications (IJCA), Vol 41(17), pp 23-29, 2012, ISSN: 0975 8887 [11] Hua Ye, Guang Deng, Stefan, J Mauger, Adam, A Hersbach, Pam, W Dawson and John, M Heasma, "A Wavelet-Based Noise Reduction Algorithm and Volume 6, Issue 1, January February 2017 Page 161

Web Site: wwwijettcsorg Email: editor@ijettcsorg Its Clinical Evaluation in Cochlear Implants", PLoS ONE, Vol 8(9), pp 1-10, 2013, DOI: 101371/ journalpone0075662 [12] NG Kingsbury," Complex wavelets for shift invariant analysis and filtering of signals", Journal of Applied and Computational Harmonic Analysis, Vol10(3), pp 234 253, 2001, ISSN: 1063-5203 [13] Ivan, W Selesnick, Richard, G Baraniuk, and Nick, G Kingsbury, "The Dual-Tree Complex Wavelet Transform", IEEE Signal Processing Magazine, Vol 22(6), pp 123 151, 2005, doi: 101109/MSP 20051550194 [14] Hosseini and Mansur Vafadust, "Almost Perfect Reconstruction Filter Bank for Non-redundant, Approximately Shift-Invariant, Complex Wavelet Transforms", Journal of Wavelet Theory and Applications, Vol 2(1), pp 1 14, 2008, ISSN: 0973-6336 [15] Yi Hu and Philipos, C Loizou, "Subjective comparison and evaluation of speech enhancement algorithms", Speech Communication, Vol 49(7), pp 588 601, 2007 Operational Research and American Journal of Signal Processing She is an Editor in Chief of International Journal of Computational Science and Information Technology (IJCSITY) She is the member of many international bodies such as IAENG, AIRCC, IACSIT etc Visited countries such as the USA and Singapore AUTHOR Dr VimalaC done her PhD in the Department of Computer Science, Avinashilingam Institute for Home Science and Higher Education for Women She has more than 2 years of teaching experience and 3 years of research experience She worked as a Project Fellow for the UGC Major Research project Her area of specialization includes Speech Recognition, Speech Synthesis and speech signal enhancement She has 17 publications at National and International level conferences and journals DrVRadha, Professor and Head in Computer Science, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamil Nadu, India She has more than 26 years of teaching experience and 16 years of Research Experience Her Area of Specialization includes Image Processing, Optimization Techniques, Speech Signal Processing, Data Mining & Data Warehousing and RDBMS She has authored more than 100 papers published in refereed International journals and Conferences She has obtained funding projects from UGC-MRP in the field of speech signal processing She is a Reviewer of American Journal Volume 6, Issue 1, January February 2017 Page 162