Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Similar documents
M. Shahidur Rahman and Tetsuya Shimamura

Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone

EEL 6586, Project - Hearing Aids algorithms

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

Automatic Judgment System for Chinese Retroflex and Dental Affricates Pronounced by Japanese Students

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Linguistic Phonetics Fall 2005

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

CHAPTER 1 INTRODUCTION

Speech (Sound) Processing

Quarterly Progress and Status Report. Masking effects of one s own voice

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phase Components

HCS 7367 Speech Perception

An active unpleasantness control system for indoor noise based on auditory masking

Telephone Based Automatic Voice Pathology Assessment.

Robust Neural Encoding of Speech in Human Auditory Cortex

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

11 Music and Speech Perception

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

IMPACT OF ACOUSTIC CONDITIONS IN CLASSROOM ON LEARNING OF FOREIGN LANGUAGE PRONUNCIATION

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

Research Article The Acoustic and Peceptual Effects of Series and Parallel Processing

A Study on the Degree of Pronunciation Improvement by a Denture Attachment Using an Weighted-α Formant

Acoustic Signal Processing Based on Deep Neural Networks

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS

BINAURAL DICHOTIC PRESENTATION FOR MODERATE BILATERAL SENSORINEURAL HEARING-IMPAIRED

FIR filter bank design for Audiogram Matching

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES

OCCLUSION REDUCTION SYSTEM FOR HEARING AIDS WITH AN IMPROVED TRANSDUCER AND AN ASSOCIATED ALGORITHM

Speech Enhancement Based on Deep Neural Networks

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

Measurement of air-conducted and. bone-conducted dental drilling sounds

HCS 7367 Speech Perception

Noise-Robust Speech Recognition Technologies in Mobile Environments

Voice Detection using Speech Energy Maximization and Silence Feature Normalization

Proceedings of Meetings on Acoustics

SLHS 1301 The Physics and Biology of Spoken Language. Practice Exam 2. b) 2 32

The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet

Speech Intelligibility Measurements in Auditorium

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

Study of perceptual balance for binaural dichotic presentation

Magazzini del Cotone Conference Center, GENOA - ITALY

A Consumer-friendly Recap of the HLAA 2018 Research Symposium: Listening in Noise Webinar

Research Article Measurement of Voice Onset Time in Maxillectomy Patients

Issues faced by people with a Sensorineural Hearing Loss

A. SEK, E. SKRODZKA, E. OZIMEK and A. WICHER

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

In-Ear Microphone Equalization Exploiting an Active Noise Control. Abstract

Chapter 40 Effects of Peripheral Tuning on the Auditory Nerve s Representation of Speech Envelope and Temporal Fine Structure Cues

A Determination of Drinking According to Types of Sentence using Speech Signals

UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication

Fig. 1 High level block diagram of the binary mask algorithm.[1]

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

How Does Speaking Clearly Influence Acoustic Measures? A Speech Clarity Study Using Long-term Average Speech Spectra in Korean Language

Proceedings of Meetings on Acoustics

Computational Perception /785. Auditory Scene Analysis

Impact of the ambient sound level on the system's measurements CAPA

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice

whether or not the fundamental is actually present.

Challenges in microphone array processing for hearing aids. Volkmar Hamacher Siemens Audiological Engineering Group Erlangen, Germany

SUPPRESSION OF MUSICAL NOISE IN ENHANCED SPEECH USING PRE-IMAGE ITERATIONS. Christina Leitner and Franz Pernkopf

Pattern Playback in the '90s

CONTACTLESS HEARING AID DESIGNED FOR INFANTS

Who are cochlear implants for?

HOW TO USE THE SHURE MXA910 CEILING ARRAY MICROPHONE FOR VOICE LIFT

A Study on Recovery in Voice Analysis through Vocal Changes before and After Speech Using Speech Signal Processing

Speech Generation and Perception

The role of low frequency components in median plane localization

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

Noise reduction in modern hearing aids long-term average gain measurements using speech

Performance Comparison of Speech Enhancement Algorithms Using Different Parameters

Implementation of Spectral Maxima Sound processing for cochlear. implants by using Bark scale Frequency band partition

Speech Compression for Noise-Corrupted Thai Dialects

Simulations of high-frequency vocoder on Mandarin speech recognition for acoustic hearing preserved cochlear implant

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist

Adaptation of Classification Model for Improving Speech Intelligibility in Noise

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

Auditory scene analysis in humans: Implications for computational implementations.

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

A Judgment of Intoxication using Hybrid Analysis with Pitch Contour Compare (HAPCC) in Speech Signal Processing

Time Varying Comb Filters to Reduce Spectral and Temporal Masking in Sensorineural Hearing Impairment

Reducing Errors of Judgment of Intoxication in Overloaded Speech Signal

The Use of a High Frequency Emphasis Microphone for Musicians Published on Monday, 09 February :50

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking

Why? Speech in Noise + Hearing Aids = Problems in Noise. Recall: Two things we must do for hearing loss: Directional Mics & Digital Noise Reduction

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

64 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 61, NO. 1, JANUARY 2014

Digital. hearing instruments have burst on the

White Paper: micon Directivity and Directional Speech Enhancement

Voice Pitch Control Using a Two-Dimensional Tactile Display

2012, Greenwood, L.

Discrete Signal Processing

Transcription:

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency Tetsuya Shimamura and Fumiya Kato Graduate School of Science and Engineering Saitama University 255 Shimo-Okubo, Sakura-Ku, Saitama, 338-8570 JAPAN shima@sie.ics.saitama-u.ac.jp http://www.sie.ics.saitama-u.ac.jp Abstract: - In this paper, we combine bone-conducted speech with air-conducted speech to improve the quality of speech in noisy environments. The bone-conducted speech is low-pass filtered, and the air-conducted speech is high-pass filtered by changing the cut-off frequency in a commonly used transition band. Through experiments, the performance of the combination filter is investigated. Key Words: - Bone-conducted speech, air-conducted speech, cut-off frequency, combination filter 1 Introduction For a variety of fields of speech applications, the quality of speech degrades due to additive noise. Once speech and noise are added, however, it is not easy to separate them from a mixture of both. While many approaches exist to reduce additive noise in noisy speech [1][2], we do not have a promising one to provide the resulting speech with sufficient quality. Recently, a great deal of attention has been paid to picking up the vibrations of bone in noisy environments [3][4]. The transmission of voice on bones is referred to as bone conduction [5]. The voice waveforms transmitted from the voice source (vocal cord) through the vocal tract wall and skull do not confront directly with noise. This is the reason why the boneconducted speech is effectively utilized in noisy environments. However, the intelligibility of the boneconducted speech is low. Even in a noiseless environment, the intelligibility of the bone-conducted speech is lower than that of the corresponding air-conducted speech, which is the speech transmitted through air. This is caused by the fact that the high frequency components of bone-conducted speech are attenuated due to impedance mismatching between bones and skins. In [4], a reconstruction filter to emphasize the high frequency components of bone-conducted speech is designed to obtain quality improvement. In [4], to design the reconstruction filter, the air-conducted speech is utilized with the bone-conducted speech. However, both are unnecessary to obtain from an air-conducted microphone and a bone-conducted microphone simultaneously. This means that a common sentence of pronunciation is not required for both the air-conducted speech and bone-conducted speech. In this paper, we set out to design a digital filter to combine the bone-conducted speech with the airconducted speech, where a common sentence is required simultaneously. Two digital filters; one is a low-pass filter and the other is a high-pass filter, are used and each output is added to produce a speech signal with improved quality. The bone-conducted speech is used as input to the low-pass filter, and the air-conducted speech is done to the high-pass filter. Both digital filters have the common cut-off frequency in each transition band. Experiments are conducted and the cut-off frequency dependency on the performance of the combination filter is investigated. 2 Bone-Conducted Speech and Air- Conducted Speech In this section, bone-conducted speech and airconducted speech are compared. Figure 1 shows a waveform comparison of both speech signals. The pronunciation is commonly /a/ /i/ /u/ /e/ /o/ of vowels. Both waveforms have been measured from a speaker using a bone-conducted microphone and an air-conducted (normal) microphone simultaneously. It is observed that the relative amplitude of the boneconducted speech is different from that of the airconducted speech. Each vowel part was detected and the power spectrum of that was calculated. Figures 2-6 show a comparison of the power spectrum of the bone-conducted and the corresponding air-conducted speech vowels, ISBN: 978-1-61804-315-3 142

respectively. We see that the high-frequency components of the bone-conducted vowels are attenuated commonly. Figure 3: Power spectrum of /i/ Figure 1: Waveforms of bone-conducted and airconducted speech signals Figure 4: Power spectrum of /u/ Figure 2: Power spectrum of /a/ 3 Combination Filter Design Figure 7 shows a block diagram to implement the combination filter to be proposed in this paper. We set out to combine the bone-conducted speech with the air-conducted speech simultaneously. As shown in Fig.7, two digital filters; one is a low-pass filter and the other is a high-pass filter, are used and each output is added to produce a speech signal with improved quality. Both filters are constructed by an Figure 5: Power spectrum of /e/ ISBN: 978-1-61804-315-3 143

Figure 6: Power spectrum of /o/ Figure 8: Spectrogram of air-conducted speech (upper), bone-conducted speech (center) and produced speech (bottom) Figure 7: Block diagram infinite impulse response (IIR) filter in practice. More specifically, a Butterworth filter with 4-th orders is designed based on setting the cut-off frequency in the transition band. One commonly used cut-off frequency is set to design the low-pass and high-pass filters. The bone-conducted speech and air-conducted speech are inputted to the designed low-pass filter and high-pass filter, respectively. Figure 8 shows an example of the spectrogram of the resulting speech signal. It is observed that the high frequency components of the output from the combination filter are compensated adequately. 4 Experiments In this section, the performance of the combination filter is investigated through experiments. 4.1 Speech Data Speech data used in the experiments are a set of bone-conducted speech and air-conducted speech of Japanese two males and two females, which are sampled by a sampling frequency of 8 khz. Both the bone-conducted speech and air-conducted speech data were pronounced from a common speaker and recorded simultaneously. We used Temco HG-17 and SHURE SM58 as the bone-conducted and airconducted microphones, respectively. Noises used are a car noise, an exhibition noise and a factory noise, which are also sampled by a sampling frequency of 8 khz. At first, only a white noise was generated from a loud speaker in a sound isolated room and two kinds of noise data were measured through a bon-conducted microphone and an air-conducted (normal) microphone, respectively. The ratio of the bone-conducted noise to the air-conducted noise was then calculated. According to the calculated ratio, noises were added to the bone-conducted speech and air-conducted speech data, and a set of noisy bone-conducted speech and noisy air-conducted speech data was prepared. Table 1 summarizes the speech data used for the experiments. Table 1: Speech data used for experiments Speech and Noise Speakers two males and females Speech Japanese sentence (10 sec) Sampling frequency 8kHz Noise car, exhibition and factory SNR 0dB,10dB ISBN: 978-1-61804-315-3 144

4.2 Preliminary Experiment As a preliminary experiment, the output of the proposed combination filter was investigated by changing the cut-off frequency used in filter design. The weighted spectral slope measure (WSSM) [6] was used for speech quality assessment of the resulting speech. Filter banks are used to evaluate the WSSM. For i-th frame, the WSSM is obtained as WS S M(i) = M spl (M ˆM) + 25 m=1 w a (m){p(m) ˆP(m)} 2 (1) where M, ˆM and M spl are parameters to evaluate the WSSM and w a (m) corresponds to the weighted coefficients for each filter bank. Slope of the power spectrum of the original speech, P(m), are compared with that of the processed speech, ˆP(m), at m-th band. For the parameters and coefficients of the WSSM, the values shown in [7] were set for the experiments. The final WSSM was evaluated by averaging the WSSM for several frames as WS S M = 1 L L WS S M(l), (2) l=1 and the improvement in the resulting WSSM was used for assessment as WS S M imp = WS S M input WS S M output (3) where WS S M input and WS S M output correspond to the WSSMs of the input speech and output speech, respectively. Figure 9 shows the improvement in WSSM by changing the cut-off frequency. Figure 9 suggests that the best performance of the combination filter is obtained with different setting of the cut-off frequency. This is because the frequency characteristics of each noise are different. Figures 10-12 show power spectra of each noise data. In Figs.10-12, the power spectrum of the noise contaminated in the bone-conducted microphone is compared with that in the air-conducted microphone for 10 consecutive frames. 4.3 Comparison Experiment To validate the effectiveness of the combination filter, the reconstruction filter derived by Tamiya et al. [4] was compared. Additionally, the iterative spectral subtraction method [8] was implemented as the postprocessing of the reconstruction filter so that the noise remained by the reconstruction filter was further suppressed. Figure 10: Spectra of car noise included in airconducted and bone-conducted microphones Figure 11: Spectra of exhibition noise included in airconducted and bone-conducted microphones Figure 12: Spectra of factory noise included in airconducted and bone-conducted microphones ISBN: 978-1-61804-315-3 145

Figure 9: Comparison in WSSM by changing cut-off frequency Figures 10-12 show the spectrogram of the output of the proposed combination filter in each noise for the case of signal-to-noise ratio of 0 db, where the iterative spectral subtraction (iterative SS) [8], Tamiya method (tamiya) [4], Tamiya method followed by the iterative spectral subtraction (tamiya and iretative-ss) and the proposed combination filter (air-bone) are compared. The cut-off frequency used for the combination filter is commonly 1600 Hz. It is observed that the output of the proposed combination filter best approaches to the original clean speech by suppressing the noise as well as by keeping the harmonics of the speech. 5 Conclusions In this paper, we have combined bone-conducted speech with air-conducted speech with the cut-off frequency changed to improve the quality of speech in noisy environments. The resulting combination filter provides better speech quality than the conventional reconstruction filter for bone-conducted speech. References: [1] J.R.Deller, J.G.Proakis and J.H.L.Hansen, Discrete- Time Processing of Speech Signals, Macmillan, 1993. [2] P.C.Loizou, Speech Enhancement: Theory and Practice, CRC Press, 2013. [3] M.Ishihara and J.Shirataki, Applying multi-level sliced speech signals to bone-conducted communication, Proc. IEEE International Symposium on Circuits and Systems, pp.2735-2738, 2005. Figure 13: Spectrogram in car noise [4] T.Tamiya and T.Shimamura, Reconstruction filter design for bone-conducted speech, Proc. Interna- ISBN: 978-1-61804-315-3 146

tional Conference on Spoken Language Processing, 2004. [5] H.M.Moser and H.J.Oyer, Relative intensities of sounds at various anatomical locations of the head and neck during phonation of the vowels, Journal of Acoustical Society of America, vol.30, no.4, pp.275-277, 1958. [6] D. H. Klatt, Prediction of Perceived Phonetic Distance from Critical-Band Spectra: A First Step, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1278-1281, 1982. [7] M.Ikehara, T.Shimamura and Y.Sanada, MATLAB Multimedia Signal Processing: Speech, Image and Communications, Baifukan, 2004. [8] K.Yamashita, S.Ogata and T.Shimamura, Spectral subtraction iterated with weighting factors, Proc. IEEE Workshop on Speech Coding, pp.138-140, 2002. Figure 14: Spectrogram in exhibition noise Figure 15: Spectrogram in factory noise ISBN: 978-1-61804-315-3 147