Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone

Similar documents
The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet

Speech Intelligibility Measurements in Auditorium

SPEECH PERCEPTION IN A 3-D WORLD

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

IMPACT OF ACOUSTIC CONDITIONS IN CLASSROOM ON LEARNING OF FOREIGN LANGUAGE PRONUNCIATION

Frequency refers to how often something happens. Period refers to the time it takes something to happen.

Assistive Listening Technology: in the workplace and on campus

Communication quality for students with a hearing impairment: An experiment evaluating speech intelligibility and annoyance

HCS 7367 Speech Perception

Organisation Internationale de Métrologie Légale

Issues faced by people with a Sensorineural Hearing Loss

Methods of validation of occupational noise exposure measurement with multi aspect personal sound exposure meter

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Computational Perception /785. Auditory Scene Analysis

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag

Consonant Perception test

Lindsay De Souza M.Cl.Sc AUD Candidate University of Western Ontario: School of Communication Sciences and Disorders

ABSTRACT INTRODUCTION

Hearing. and other senses

whether or not the fundamental is actually present.

SLHS 1301 The Physics and Biology of Spoken Language. Practice Exam 2. b) 2 32

CHAPTER 1 INTRODUCTION

Evaluating the role of spectral and envelope characteristics in the intelligibility advantage of clear speech

OIML R 122 Annex C RECOMMENDATION. Edition 1999 (E) ORGANISATION INTERNATIONALE INTERNATIONAL ORGANIZATION

The effect of binaural processing techniques on speech quality ratings of assistive listening devices in different room acoustics conditions

TOPICS IN AMPLIFICATION

HOW TO USE THE SHURE MXA910 CEILING ARRAY MICROPHONE FOR VOICE LIFT

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

Topics in Linguistic Theory: Laboratory Phonology Spring 2007

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

Acoustics, signals & systems for audiology. Psychoacoustics of hearing impairment

CLASSROOM AMPLIFICATION: WHO CARES? AND WHY SHOULD WE? James Blair and Jeffery Larsen Utah State University ASHA, San Diego, 2011

Communication with low-cost hearing protectors: hear, see and believe

Providing Effective Communication Access

Linguistic Phonetics Fall 2005

Effects of noise and filtering on the intelligibility of speech produced during simultaneous communication

Roger TM. Learning without limits. SoundField for education

EEL 6586, Project - Hearing Aids algorithms

Wolfgang Probst DataKustik GmbH, Gilching, Germany. Michael Böhm DataKustik GmbH, Gilching, Germany.

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES

Best Practice Protocols

Perceptual Effects of Nasal Cue Modification

Two Modified IEC Ear Simulators for Extended Dynamic Range

General about Calibration and Service on Audiometers and Impedance Instruments

A new era in classroom amplification

BINAURAL DICHOTIC PRESENTATION FOR MODERATE BILATERAL SENSORINEURAL HEARING-IMPAIRED

April 23, Roger Dynamic SoundField & Roger Focus. Can You Decipher This? The challenges of understanding

Introduction to Audio Forensics 8 th MyCERT SIG 25/04/2006

HCS 7367 Speech Perception

Spectrograms (revisited)

Classroom acoustics for vocal health of elementary school teachers

PC BASED AUDIOMETER GENERATING AUDIOGRAM TO ASSESS ACOUSTIC THRESHOLD

Voice Low Tone to High Tone Ratio - A New Index for Nasal Airway Assessment

Sonic Spotlight. SmartCompress. Advancing compression technology into the future

Speech Recognition Experiment in Natural Quiet Background Noise

Impact of the ambient sound level on the system's measurements CAPA

The Benefits and Challenges of Amplification in Classrooms.

Hearing Evaluation: Diagnostic Approach

Hearing Protection Systems

DESIGN CONSIDERATIONS FOR IMPROVING THE EFFECTIVENESS OF MULTITALKER SPEECH DISPLAYS

Hearing Conservation Program

WIDEXPRESS. no.30. Background

In-Ear Microphone Equalization Exploiting an Active Noise Control. Abstract

Proceedings of Meetings on Acoustics

Speech Spectra and Spectrograms

GSI AUDIOSTAR PRO CLINICAL TWO-CHANNEL AUDIOMETER

THE MECHANICS OF HEARING

How Does Speaking Clearly Influence Acoustic Measures? A Speech Clarity Study Using Long-term Average Speech Spectra in Korean Language

Roger TM at work. Focus on work rather than on hearing

A. SEK, E. SKRODZKA, E. OZIMEK and A. WICHER

Two ears are better than one.

Wireless Technology - Improving Signal to Noise Ratio for Children in Challenging Situations

Evidence base for hearing aid features:

Role of F0 differences in source segregation

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

3M Center for Hearing Conservation

Hearing Conservation Program

Speech intelligibility in simulated acoustic conditions for normal hearing and hearing-impaired listeners

An active unpleasantness control system for indoor noise based on auditory masking

Validation Studies. How well does this work??? Speech perception (e.g., Erber & Witt 1977) Early Development... History of the DSL Method

M. Shahidur Rahman and Tetsuya Shimamura

Sound localization psychophysics

SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds

Time Varying Comb Filters to Reduce Spectral and Temporal Masking in Sensorineural Hearing Impairment

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

Child Profile: Hearing

Roger Dynamic SoundField A new era in classroom amplification

ACOUSTIC ANALYSIS AND PERCEPTION OF CANTONESE VOWELS PRODUCED BY PROFOUNDLY HEARING IMPAIRED ADOLESCENTS

Binaural hearing and future hearing-aids technology

The Ear. The ear can be divided into three major parts: the outer ear, the middle ear and the inner ear.

Quarterly Progress and Status Report. Masking effects of one s own voice

COMPARISON OF SUBJECTIVE AND OBJECTIVE ASSESSMENT OF ACOUSTIC COMFORT IN A RESTAURANT

Welcome to the LISTEN G.R.A.S. Headphone and Headset Measurement Seminar The challenge of testing today s headphones USA

You and Your Student with a Hearing Impairment

Simulation of an electro-acoustic implant (EAS) with a hybrid vocoder

Phoneme Perception Test 3.0

Topics in Amplification CONNECTIVITY COMMUNICATION WITHOUT LIMITS

Transcription:

Acoustics 8 Paris Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone V. Zimpfer a and K. Buck b a ISL, 5 rue du Général Cassagnou BP 734, 6831 Saint Louis, France b French German Institut of Saint Louis (ISL), 5 rue du Général Cassagnou, 6831 Saint-Louis, France veronique.zimpfer@isl.eu 489

Acoustics 8 Paris When speaking, not only air conducted noise is generated, but also vibrations can be recorded at different places on the head using accelerometers. Bone conduction microphones are less sensitive to noise than regular acoustical microphones, they can be used in harsh environments, and they are compatible with head equipment like NBC protection devices. This paper reports the first results of a study designed to evaluate the differences in perception between speech recorded with an acoustic microphone and speech recorded using bone conduction. These differences may be the cause of bad intelligibility of communication based on bone conduction microphones, even if recorded in an undisturbed environment. We studied the recognition of ten French phonetic vowels recorded with a bone conduction microphone. A listening test has been designed to show the confusions of phonetic vowels when listening to speech picked up by air or bone conduction microphones. The tests show confusion between the vowels [i] [y] [u]. All of these vowels have the frequency of their first formant in common. The spectrograms of these vowels, which are distinctly different when recorded with an aero-acoustic microphone, become almost identical when the bone conducted speech is analyzed. Moreover, the confusions of the phonetic vowels depend on the speaker. 1 Introduction For soldiers, it is necessary to be able to communicate even in a noisy environment. Communication systems based on bone conduction have been developed for some years. Vibrators, initially developed for hearing impaired people, are also used to allow communication when wearing ear plugs. Moreover, the production of a speech signal produces vibration in the bony structure of the head which may be recorded using accelerometers [1]. The main advantages of use of bone conduction microphones are: Bone conduction microphones are less sensitive to environmental noise than classical acoustic microphones (especially in the low frequency domain). The Signal Noise Ratio (SNR) is better when using the bone conduction microphone than with a simple differential microphone. Bone conduction microphones are compatible with head equipment like NBC masks.,2,1 -,1,2 -,2 1 1 2 Speech signal 2 3 Speech signal 3 4 Fig.1 Speech signal of the same speaker with the standard microphone (upper plot) and with the bone conduction microphone (lower plot) in a noisy environment (pink noise at 87 dba) The figure 1 represents the different speech signals recorded with the standard microphone (upper curve) and with the bone conduction microphone (lower curve). The 4 5 5 6 6 7 records have been realized in a reverberant room with pink noise at 87 dba. Using the standard microphone the SNR is equal to 5 db. On the other hand when using the bone conduction microphone the SNR is equal to 18 db. Due to the fact that it is less sensitive to the noise, intelligibility in noisy environments is better when using a bone conduction microphone than with a standard microphone. On the other hand in the absence of noise the intelligibility of the bone conduction microphones becomes worse. This is due to the fact that the bone conduction microphone distorts the speech signal. One of the goals of our study is to check if it is possible to use the microphones for automatic vocal recognition. In order to evaluate the influence of the bone conduction microphone on speech recognition, a subjective and objective analysis on 1 French phonetic vowels was carried out. 2 Experimentation Three speakers read 1 French vowels in an audiometric cabin. The speech signal was recorded simultaneously with a standard air microphone and two bone conduction microphones. The first bone conduction microphone is placed at the top of the head and the second at high right cheek. Each speaker was equipped with both bone conduction microphones and read a list of 1 vowels at a distance of 6 cm to a reference microphone (B&K ½ ). The three speech signals were recorded simultaneously with a sampling rate of 24 khz. Each speaker read a different list of the same vowels: List 1 (speaker 1): [a] [S] [o] [u] [e] [i] [T] [y] [R] [V], List 2 (speaker 2): [R] [o] [e] [i] [T] [y] [a] [u] [V] [S], List 3 (speaker 3): [u] [V] [i] [e] [a] [y] [R] [S] [T] [o]. 3 Results of subjective test and discussion In order to estimate the intelligibility of the vowels, a subjective test was carried out for one of the two bone conduction microphones. 49

Acoustics 8 Paris Recognized phonetic vowels Spoken phonetic vowels [a] [V] [i] [e] [y] [o] [S] [u] [R] [T] [a] 98% 19% 14% 4% [V] 86% 2% 14% 12% 4% 2% 5% [i] 35% 2% 26% [e] 7% 67% 21% 4% [y] 49% 4% 95% 7% 37% 5% [o] 2% 47% 5% [S] 2% 6% 5% 5% [u] 7% 12% 33% 2% [R] 2% 2% 4% 18% 2% 74% 25% [T] 2% 14% 2% 49% [D] 4% 5% 2% Table 1 confusion matrix of the subjective test The test has been realized with the first bone conduction microphone placed on the high right cheek. About thirty listeners listened to the three series of vowels and noted the vowels which they recognized. This experiment shows that the rate of recognition of the ten French vowels depends on speaker. The rate of confusion can be compared to a CVC score for which the relationship to speech intelligibility is described by Steeneken and Houtgast [2]. The rates depending on the speaker are: 83 % of recognition for speaker 1, rate which would be qualified good in a CVC (Consonant Vowel Consonant) test 63 % of recognition for speaker 2, rate which would be qualified fair in a CVC test < 5 % of recognition for speaker 3, rate which would be qualified poor in a CVC test For speaker 1 the main confusions are: [i] confused with [y] [u] confused with [y] [S] confused with [R] For speaker 2 the main confusions are: [i] confused with [y] or [u] [u] confused with [y] [e] confused with [T] For speaker 2 the main confusions are: [i] confused with [y] [u] confused with [y] or [i] [o] confused with [e] or [V] [V] confused with [e] Confusion between the nasal vowels 3.1 Confusion matrix Table 1 corresponds to the matrix of confusion for all speakers. The rate of general recognition for these 1 vowels is 64%, rate which would be qualified fair in a CVC test. This table shows that the three highest rates of recognition (higher than 86 %) are found for the three central vowels [ a ] [ V ] [ y ] in the acoustic triangle, representation of oral vowels in the F1-F2 plan, where F1 represents the frequency of the first formant and F2 the frequency of the second formant (cf figure 2). Confusion between the three vowels [u] [y] [i] (highlighted in green in table 1) are frequent. The common point of these three vowels is the frequency of the first formant (approximately 25 Hz) as it is displayed in the acoustic triangle of the vowels (cf figure 2). It is the same for the confusions between the vowels [o] [V] [e] (highlighted in blue in table 1). In this case the three vowels have, at a frequency of 38 Hz, the first formant in common. It can be concluded that the recognition of a vowel recorded with a bone conduction microphone depends on: The speaker: indeed for speaker 1 (series 1) few confusions appeared (rate of recognition of 83%), on the other hand for speaker 3 (series 3) many confusions were observed (rate of recognition < 5 %). The vowel: the vowel [i], independently of the speaker, is generally confused with [y] or [u] (rate of recognition < 4%). On the other hand, the vowel [a] is recognized up to 98 %. This high recognition rate can be explained by the fact that the vowel [a] corresponds to the tip of the vowel triangle (cf fig 2). Moreover the vowel [a] is the vowel with the highest frequency for the first formant. 491

Acoustics 8 Paris spectrogram of french vowels : [u] [y] [i] 525 45 3 3 225 15 75,,3,5,8 1,1 1,3 1,6 1,9 2,1 2,4 2,7 2,9 Fig.2 representation of French vowels represented in the plane of the first 2 formants (F1, F2) Fig.3 Spectrogram of the vowels [u] [y] [i] pronounced by speaker 1: the first three spectrograms correspond to a three are recorded with a bone conduction microphone 3.2 Time-frequency analyzes For a better understanding of the results of listening tests we determined for each vowel a wide band spectrogram ( f = 187.5 Hz). When using wide band analyzes it is possible not to take into account the fundamental of a vowel. For the groups of vowels where confusion has been detected ([u] [y] [i] and [o] [V] [e]) the spectrograms recorded with a bone conduction microphone have been compared to those recorded with the air microphone. 3.3 Comparison [u] [y] [i] Figures 3 and 4 represent the spectrograms of the three vowels [u] [y] [i] for two speakers (figure 3 for speaker 1, figure 4 for speaker3). On each figure, the first three spectrograms correspond to the vowels recorded with the reference acoustic microphone. The three others are recorded with the bone conduction microphone. These figures (especially figure 4) allow to show that there is an amplification of the frequencies between 2 khz and 2,5 khz on the spectrograms of the speech signals recorded with the bone conduction microphone which is not visible on the signals obtained with the reference microphone. Especially on figure 4, the spectrograms of the vowels [u] [y] [i] recorded by bone conduction microphone are almost identical and resemble the spectrogram of [y] recorded with the reference microphone. In this case, it is difficult to identify the three vowels. With the automatic speech recognition, it is only possible to recognize the central vowel [y] when one of the three vowels [u] [y] [i] is pronounced. 525 45 3 3 225 15 75 spectrogram of french vowels : [u] [y] [i],,3,5,8 1,1 1,3 1,6 1,9 2,1 2,4 2,7 Fig.4 Spectrogram of the vowels [u] [y] [i] pronounced by speaker 2: the first three spectrograms correspond to a three are recoded with a bone conduction microphone 3.4 Comparison [o] [V] [e] Figures 5 and 6 represent the spectrograms of the three vowels [o] [V] [e] for two speakers. On each figure, the first three spectrograms correspond to the vowels recorded with the reference microphone and the three others to the recordings with bone conduction microphone. As for the previous case ([u] [y] [i]), these figures allow to show (figure 6) that there is an amplification of the frequencies between 2 khz and 2,5 khz on the spectrograms of the vowels recorded with the bone conduction microphone. Especially on figure 6, the spectrograms of the vowels [o] [V] [e] recorded with the bone conduction microphone are almost identical, and resemble very much to the spectrogram of [e] recorded with the air microphone. 492

Acoustics 8 Paris 525 45 3 3 225 15 75 spectrogram of french vowels : [o] [V] [e],,3,5,8 1,1 1,3 1,6 1,9 2,1 2,4 2,7 2,9 show that spectrograms recorded with the second bone conduction microphone differ from those recorded with the first bone microphone. The common point between these two bone conduction microphones is that the high frequency components (> 2,5 khz) are attenuated. On the other hand the second bone conduction microphone doesn t amplify the components between 2 khz and 2,5 khz. But even with the second bone conduction microphone the [i] is perceived as [y]. It would be necessary to look further into the study in order to determine if the location of the bone conduction microphone on the head has an influence on intelligibility. Fig.5 Spectrogram of the vowels [o] [V] [e] pronounced by speaker 1: the first three spectrograms correspond to a three are recorded with a bone conduction microphone 525 45 3 3 225 15 75 spectrogram of french vowels : [o] [V] [e],,3,5,8 1,1 1,3 1,6 1,9 2,1 2,4 Fig.6 Spectrogram of the vowels [o] [V] [e] pronounced by speaker 3: the first three spectrograms correspond to a three are recorded with a bone conduction microphone 4 Comparison of two bone conduction microphones Figure 7 represents the spectrograms of the vowel [i] for two speakers recorded with three different microphones. Figure 8 represents the spectrograms of the vowel [e] for two speakers recorded by three different microphones. On each figure, the first three spectrograms correspond to the vowel pronounced by speaker 1 and the three others to the vowel pronounced by speaker 3. For each speaker the first spectrogram corresponds to speech signal recorded with the reference microphone the second spectrogram corresponds to speech signal recorded with the bone conduction microphone placed on the upper right cheek the third spectrogram corresponds to speech signal recorded by the bone conduction microphone placed on the forehead. This figures show that for a vowel, the spectrogram is different in function of the used microphone. Besides, they spectrogram of french vowel : [i] 525 45 3 3 225 15 75,,3,5,8 1,1 1,3 1,6 1,9 2,1 2,4 2,7 Fig.7 Spectrogram of the vowel [i] pronounced by speaker 1 (first three spectrograms) and speaker 3 (last three spectrogram) recorded with the reference microphone and the two bone conduction microphones. 525 45 3 3 225 15 75 spectrogram of french vowel : [e],,3,5,8 1,1 1,3 1,6 1,9 2,1 2,4 2,7 Fig.8 Spectrogram of the vowel [e] pronounced by speaker 1 (first three spectrograms) and speaker 3 (last three spectrogram) recorded with the reference microphone and the two bone conduction microphones. 5 Conclusion During the present study we have realized that recognition rate of vowels recorded with a bone conduction microphone depends on the speaker and on the vowel. It has been shown that for certain speakers the recognition rate is quite bad, whereas for others it may be satisfactory. It has been noticed that in the triangle of French vowels (figure 2) only the central vowel shows for all speakers a satisfactory recognition rate. 493

Acoustics 8 Paris When analyzing the vowels with large band spectrograms it has been seen that the frequencies in the range between 2 khz and 2.5 khz are amplified when compared with the spectrogram calculated from signals recorded with aeroacoustic microphones. This amplification may be the major reason for the confusions between the vowels [o] [V] [e] and between the vowels [u] [y] [i]. To answer the question about the feasibility of using a bone conduction microphone for future functions such as automatic speech recognition, In one hand: Bone conduction microphones are less sensitive to environmental noise, the system will have a greater performances in strong environments. They are compatible with head equipments such as NBC masks. In an other hand, It is certain that in the case of speaker 3, the system will have difficulties to distinguish between an [i] or an [u] and an [y]. A speech recognition engine, being based on the recognition of the vowels, will be able to distinguish in certain cases only three families of vowels: o [a ], o [V] for the vowels [o] [V] [e] (and even for [C] [Z] [D]), o [y] for the vowels [u] [y] [i]. To be able to use the bone conduction microphones it is necessary to choose the vocabulary in a way that there are no ambiguities, i.e. not to choose words whose continuation of the vowels is the same one when considering that only 3 vowels ([a] [V] [y]) are recognized. References [1] Paolo E. Giua, "Voice transmission through vibration pickups", CNR Istituto di Acoustica, via Cassia 1216, Roma, Italy, [2] Steeneken H.J.M., Houtgast T., Basics of the STI Measurement Method, in Past, Present and Future of the Speech Transmission Index, Edited by Van Wijngarden S.J., published by TNO Human Factors, ISBN 9-7672-2-, 22 Future investigations will be realized to understand, the mechanisms leading to different spectrograms of the speech signals when recorded by bone or by air microphones the importance of the position of the accelerometer on the head. Acknowledgments The acknowledgments go to SAGEM DS which contributed to the study and which provided us bone conduction microphones. 494