Genesis of wearable DSP structures for selective speech enhancement and replacement to compensate severe hearing deficits

Similar documents
Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

EEL 6586, Project - Hearing Aids algorithms

Implementation of Spectral Maxima Sound processing for cochlear. implants by using Bark scale Frequency band partition

BINAURAL DICHOTIC PRESENTATION FOR MODERATE BILATERAL SENSORINEURAL HEARING-IMPAIRED

AVR Based Gesture Vocalizer Using Speech Synthesizer IC

SPEAR3. Product Brief 16/5/2003. SPEAR3 3 rd generation Speech Processor for Electrical and Acoustic Research

The first choice for design and function.

WIDEXPRESS. no.30. Background

Tune in on life with SCOLAbuddy. A new fm receiver from Widex

Time Varying Comb Filters to Reduce Spectral and Temporal Masking in Sensorineural Hearing Impairment

Computational Perception /785. Auditory Scene Analysis

Sign Language Interpretation Using Pseudo Glove

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds

A Determination of Drinking According to Types of Sentence using Speech Signals

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS

CHAPTER 1 INTRODUCTION

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

Speech to Text Wireless Converter

Four-Channel WDRC Compression with Dynamic Contrast Detection

Best Practice Protocols

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

Sound Texture Classification Using Statistics from an Auditory Model

Digital. hearing instruments have burst on the

Design and Implementation of Programmable Hearing Aid Using FPAA

CONTACTLESS HEARING AID DESIGNED FOR INFANTS

HOW TO USE THE SHURE MXA910 CEILING ARRAY MICROPHONE FOR VOICE LIFT

Proceedings of Meetings on Acoustics

OIML R 122 Annex C RECOMMENDATION. Edition 1999 (E) ORGANISATION INTERNATIONALE INTERNATIONAL ORGANIZATION

Speech (Sound) Processing

MCMP:Multichannel Multi Point Voice Communication in NFMI Hearing Aid

A Sound Foundation Through Early Amplification

Phoneme Perception Test 3.0

AND BIOMEDICAL SYSTEMS Rahul Sarpeshkar

Solutions for better hearing. audifon innovative, creative, inspired

When dedicated power hearing technology brings you closer, life is on. Phonak Naída TM

Microphone Input LED Display T-shirt

Nature has given us two ears designed to work together

Phonak Target 4.3. Desktop Fitting Guide. Content. March 2016

A neural network model for optimizing vowel recognition by cochlear implant listeners

Contents. the pleasure of hearing. exceptional sound. ease and comfort. mind Audibility Extender 8 TruSound compression system 9 ClearBand 10

Digital hearing aids are still

Noise-Robust Speech Recognition Technologies in Mobile Environments

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

Amigo Star. Insert photos matching the introduction paragraph of the literature review

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

International Journal of Information Technology Convergence and Services (IJITCS) Vol.2, No.5, October 2012

A. SEK, E. SKRODZKA, E. OZIMEK and A. WICHER

ReSound NoiseTracker II

Effects of Vibration Motor Speed and Rhythm on Perception of Phone Call Urgency

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

SNJB College of Engineering Department of Computer Engineering

Automatic Live Monitoring of Communication Quality for Normal-Hearing and Hearing-Impaired Listeners

Fig. 1 High level block diagram of the binary mask algorithm.[1]

ITU-T. FG AVA TR Version 1.0 (10/2013) Part 3: Using audiovisual media A taxonomy of participation

Speech conveys not only linguistic content but. Vocal Emotion Recognition by Normal-Hearing Listeners and Cochlear Implant Users

Development of a portable device for home monitoring of. snoring. Abstract

Desktop Fitting Guide for Phonak Brio 3

ELECTROACOUSTIC EVALUATION OF THE RESOUND UNITE MINI MICROPHONE WITH OTOMETRICS AURICAL HIT

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice

Research Article The Acoustic and Peceptual Effects of Series and Parallel Processing

Re/Habilitation of the Hearing Impaired. Better Hearing Philippines Inc.

SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING

AudioSense: Enabling Real-time Evaluation of Hearing Aid Technology In-Situ

FIR filter bank design for Audiogram Matching

ADHEAR The new bone-conduction hearing aid innovation

Power Instruments, Power sources: Trends and Drivers. Steve Armstrong September 2015

Modulation and Top-Down Processing in Audition

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

Consonant Perception test

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

Date: April 19, 2017 Name of Product: Cisco Spark Board Contact for more information:

HearIntelligence by HANSATON. Intelligent hearing means natural hearing.

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants

Sonic Spotlight. SmartCompress. Advancing compression technology into the future

Spectrograms (revisited)

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag

Embedded Stethoscope for Heart Sounds

Price list Tools for Research & Development

Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)

Audibility, discrimination and hearing comfort at a new level: SoundRecover2

The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet

2/16/2012. Fitting Current Amplification Technology on Infants and Children. Preselection Issues & Procedures

An Auditory System Modeling in Sound Source Localization

Platinum Series Sound Processor

HearPhones. hearing enhancement solution for modern life. As simple as wearing glasses. Revision 3.1 October 2015

Open Portable Platform for Hearing Aid Research

Elements of Effective Hearing Aid Performance (2004) Edgar Villchur Feb 2004 HearingOnline

About Varibel. Varibel strives to improve the quality of life for everyone who comes in touch with our products.

personalization meets innov ation

Discrete Signal Processing

how we hear. Better understanding of hearing loss The diagram above illustrates the steps involved.

Assistive Listening Technology: in the workplace and on campus

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

AND9020/D. Adaptive Feedback Cancellation 3 from ON Semiconductor APPLICATION NOTE INTRODUCTION

Voice Switch Manual. InvoTek, Inc Riverview Drive Alma, AR USA (479)

Transitioning from the CS4362 to the CS4362A or CS4365

Transcription:

Genesis of wearable DSP structures for selective speech enhancement and replacement to compensate severe hearing deficits Axel PLINGE, Dieter BAUER Leibniz Institute for Occupational Health, Dortmund, Germany Abstract: Conventional hearing instruments often cannot compensate the hearing losses to a sufficient degree. Digital signal processing offers novel chances to provide speech enhancement and replacement functionalities in a wearable device. Targeting such a device, a working laboratory prototype with DSPs was created. Keywords: hearing impairment, frication, transposition, selectivity, DSP, wearable, classification Introduction In case of severe sensory hearing deficits, conventional hearing instruments are often insufficient to compensate sensory hearing losses in order to enable proper communication. Too many sounds are too weak or inaudible, leading to ill classification and confusion. Even worse, the residual speech recognition abilities are further disturbed by environmental noise and competing speakers. Thus, beyond very good noise reduction functionality, the users in our target group urgently need support in the form of enhancing or replacing otherwise inaudible speech features. Digital signal processing offers novel chances for delivering such functionalities. Modern digital low power processors are increasingly powerful and can be used as an inexpensive basis (as opposed to custom-made circuitry) to implement more and more functionality into wearable equipment with small rechargeable batteries as power supply.... front end DSP1 multi-band speech enhancement delay (temporal alignment) Σ transmitter DSP34 or radio link DSP2 phonetic spotter stimulus generation Figure 1. Basic structure

Parallel Data Bus 1. System Genesis 1.1. Basic Structure Many simulations and evaluations of available digital integrated circuits with low power consumption have led to the basic processing structure shown in figure 1. The processing core consists of two coupled subunits, each of which has one DSP at its centre. It provides baseband processing with controlled compression as well as enhancement and replacement of speech features controlled by a phoneme spotter. This unit can be coupled to either one of two possible front-ends for providing substantial noise reduction. On solution would be radio-link microphones worn by the communication partner. This requires will and acceptance of the partner that is not always granted. The second solution is the use of an intelligent microphone array that adapts to the noise field characteristics. We expect such a solution to be feasible with two further DSPs. 1.2. Laboratory Prototype The laboratory prototype consists of two coupled ADSP 2189 with surrounding interface hardware (figure 2). The functional blocks for one of the DSPs are sketched in figure 3. Each evaluation board can be equipped with both RAM and flash memory. The Flash is necessary to host the DSP software with boot loader when no PC is connected. An RS232 connection to a PC is used to upload the DSPs program into RAM or flash. The interface is also used to talk to the DSPs while running the program, thus allowing on-line monitoring and modification of the processing parameters. Power Supply (5.0V / 3.0V / 1.8V etc) Microcontroller PIC16F877 BUS Multiplexer Control Interface Diagnotics RAM Memory 128k Byte PC 2. DSP PLD RS232 Interface Serial data exchange DSP-Module ADSP-2189M 75Mips Flash Memory 512k Byte I/O Module Codec Module Codec Module I/O Module Figure 2. Laboratory prototype Figure 3. Blocks of on DSP module

1.3. Wearable Solution The wearable device will be based on the current laboratory prototype, but stripped of many components. Essentially the processor and flash memory and one codec will remain. The RS232 interface will be replaced by IrDA circuitry for wireless coupling to the PC (marked grey in figure 3). 2. Current Implementation The first aim of Goal of this implementation was to demonstrate that all the functionalities of the rather complex design (previously evaluated in simulation) of spotter controlled transposition of /s, z/ /C/ and /t/ is feasible using selected low power circuitry that can be easily transformed into a wearable design powered by lithium ion batteries [3, 4]. The second aim was to implement a better baseband processing surpassing the previous design [1] by intelligent control of the compression. 2.1. Baseband Figure 3 shows a simplified block diagram of the base band processing. The input filter bank consists of three linear (finite impulse response) filters that have individually different pass-bands to allow for a speech-mode pre-equalizing processing. The following multi-band compression uses tree different temporal characteristics that again may differ between bands. Within the higher second formant range (the third channel) additional processing for the temporal envelope may be introduced as novelty for speech-specific enhancement of second formant features (SEF). Another new feature are the two external control lines: The spotter control can be used to introduce phoneme dependant compression gain or specific SEF. All processing parameters can be modified selectively according to the spotted phoneme class. The spatial control may be used when the microphone array processor is used as font-end. Reliable speaker identification is transmitted to modulate the compression of ambient noise to a predetermined, non-masking level. Table 1 gives an account of the processing power used by the current implementation. We can conclude that all functionality whose salience was pre-established in simulations fits well into one 75MIPS (million instructions per second) DSP. spatial control controlled multi band compression (3 temporal characteristics, look-ahead, band coupling) spotter control Figure 3. Baseband processing post Σ 20 Compression 15 Control 10 Communication 15 Management 60 Table 1. Baseband MIPS

2.2. Transposer The Transposition Unit can be roughly divided into three main functional blocks as shown in Figure 4. The phoneme spotter extracts a set of speech features in the feature extraction block, and then classifies the feature vector as one (or none) of the predefined phoneme classes (classifier). The detection of a speech feature or phoneme to be replaced is then triggering the generation of replacement stimuli in the third block (and modifying processing parameters of the base band processing). The whole design was made considering the severe constraints of processing power (since targeting a wearable device) and time (to allow perceptual integration) from the first simulations to the present working prototype. 2.2.1. Feature Extraction Under the aforementioned constraints, only a small number of features can be used to reliably detect the fricatives and plosives in question. Special evaluation lead to refinement of the spectral features to the two ratios of three band for best separation of /S/ and /C/ and comparison of the energy values to an /s/ band situated beyond 4.5Khz [3, 4]. To avoid temporal asynchrony, four linear phase filters are used in conjunction with moving average and 16bit division. This branch requires 24MIPS of calculation effort. To separate voiced from unvoiced speech, the maximum value of the normalized cross correlation is used [6]. In order to use this very salient feature, the calculation was handoptimised in assembler code - under constant control of the resulting quality - down to 6MIPS (from about 100). Since the need for special treatment of /t/ became evident, plosive features had to be added [5]. For plosion burst detection, several energy deviation measures with different bands were tested. A single ROR (rate-of-rise) feature with just one pre-filter was found to yield good significance. A pause detector was added to account for the plosive closure. 2.2.2. Classification To classify the so derived feature vector, a threefold phoneme recognition scheme was devised [3, 4]. A Gaussian distance measure is evaluated using prototypes that are calculated >4.5kHz 2.4-3.8kHz 1.2-2.4kHz 0.6-1.2kHz Downsmpl. Avg Ratio NCCF range & post #ZC sin s' sin C' t' modulator Prefilter Prefilter ROR Pause features (36MIPS) µ distance classifier (20 MIPS) stimulus generation (10MIPS) soft switch Figure 4. Transposer

using the PC simulation of the classifier and hundreds of labelled speech samples. After omitting covariance, the distance function evaluated can be reduced to equation 1, requiring just 11MIPS for 6 features and 6 classes. x i μki d Κ ( x) : = 2log pk 2 logσ K = CK ( xi μki ) SKi (1) σ Ki To accommodate asymmetric deviations and exclude unwanted phonemes, a range check of the feature vector was introduced. It may also be utilized to adjust the transposers selectivity, as discussed in [8]. For temporal smoothing of the recognition result, a post correction is added. 2.2.3. Replacement Stimulus Generation After careful evaluation to find optimised replacement stimuli [7, 8], a high-quality but lowcost generation was implemented: No more than 10 MIPS are needed for concatenating stored data, sine modulation and zero-crossing rate measurement. 2 Conclusion Given the successful implementation within the laboratory prototype, the construction of a wearable test device that provides high quality assistance in speech understanding can be considered to be feasible and can be built. Acknowledgements: We would like to thank W.H. Ehrenstein for revising the English text. References [1] A. Plinge, D. Bauer, M. Finke (2001): Intelligibility enhancement of human speech for severely hearing impaired persons by dedicated digital processing In: Crt Marincek et al. (eds.) Assistive Technology - Added Value to the Quality of Life. IOS Press [2] L. Arslan and J. H. L. Hansen (1994): Minimum cost based phoneme class detection for improved iterative speech enhancement, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing Vol. 2 pp. 45 48 [3] D. Bauer, A. Plinge, M. Finke (2002): Selective Phoneme Spotting for Realization of an /s, z, C, t/ Transposer. In: Miesenberger et al. (eds): Computers Helping People with Special Needs, 8th ICCHP Proceedings, Lecture Notes in Computer Science 2398. Springer, Heidelberg [4] A. Plinge, D. Bauer (2003): Introducing Restoration of Selectivity in Hearing Instrument Design trough Phoneme Spotting In: G. M. Craddock et al (eds.): Assistive Technology Shaping the Future. IOS Press [5] B. Plannerer et al. (1996): A continuous speech recognition system integrating additional acoustic knowledge sources. Technical report, TU München [6] D. Talking (1995): A Robust Algorithm for Pitch Tracking, Speech Coding and Synthesis, W.B. Kelijn and K.K. Paliwal (Eds.), Elsevier Science [7] D. Bauer, A. Plinge and W.H. Ehrenstein (2003): Compensation of Severe Sensory Hearing Deficits. Two Different Approaches to Replace Inaudible Speech Elements: Re-Sampling Versus Re-Synthesis. In: G.M. Craddock et al. Assistive Technology Shaping the Future. IOS Press [8] D. Bauer, A Plinge (2005): Tools and Strategies for Fitting a Wearable Frication Transposer to the Needs of Severely Hearing Impaired People (this volume)