Head shadow enhancement with fixed beamformers improves sound localization based on interaural level differences arxiv:171.194v1 [eess.as] 5 Oct 217 Benjamin Dieudonné, Tom Francart KU Leuven University of Leuven, Department of Neurosciences, Experimental Oto-rhino-laryngology, Herestraat 49 bus 721, B-3 Leuven, Belgium. benjamin.dieudonne@med.kuleuven.be, tom.francart@med.kuleuven.be August 11, 217 Abstract A new method to enhance head shadow in low frequencies is presented, resulting in interaural level differences that can be used to unambiguously localize sounds. Enhancement is achieved with a fixed beamformer with ipsilateral directionality in each ear. The microphone array consists of one microphone per device. The method naturally handles multiple sources without sound location estimations. In a localization experiment with simulated bimodal listeners, performance improved from 51 to 28 root-mean-square error compared with standard omni-directional microphones. The method is also promising for bilateral cochlear implant or hearing aid users and for improved speech perception in multi-talker environments. Keywords: head shadow enhancement, enhancement of interaural level differences, sound localization 1 Introduction The human ear acts as an acoustic beamformer, as it naturally focuses on sounds that arrive from its ipsilateral side due to the head shadow. This beamforming behavior is very useful. On the one hand, it introduces interaural level differences (ILDs) which can be used to localize sounds in the horizontal plane (Rayleigh, 197). On the other hand, it might attenuate noise sources such that it improves speech understanding (Bronkhorst and Plomp, 1988). Unfortunately, the ear is not an ideal beamformer. Firstly, the beamforming only applies for higher frequencies: for low frequency sounds, the head is invisible. This is especially problematic for people with high-frequency hearing loss. Secondly, the induced ILDs cannot be used to localize sounds unambiguously for all directions: for large angles, the natural ILD-versus-angle function becomes non-monotonic (Shaw, 1974). This is especially problematic for people with poor perception of interaural time differences (ITDs), who mostly rely on ILDs for localization. Several authors have tried to enhance this behavior and have shown advantages in sound localization and speech intelligibility for different populations. In 29, Francart et al. filed a patent to optimize localization cues in a bilateral hearing aid system, giving the example of transposing ILDs to a lower frequency (Francart et al., 213). By adapting 1 of 9
the level in the hearing aid to obtain the same broadband ILD as a normal-hearing listener, Francart et al. (29) have shown improved sound localization in an acoustic simulation of bimodal cochlear implant (CI) listeners (CI users with a contralateral hearing aid). Lopez-Poveda et al. (216) implemented a similar strategy for bilateral cochlear implant users, inspired by the contralateral medial olivocochlear reflex. Their strategy attenuated sounds in frequency regions with larger amplitude on the contralateral side, resulting in an increase in speech intelligibility for spatially separated speech and noise. Since both strategies are solely based on level cues that are present in the acoustic signal, they cannot solve the problem of the non-monotonic ILD-versus-angle function. Moreover, they can only handle multiple sound sources if they are temporally or spectro-temporally separated. Francart et al. (211) adapted their strategy by applying an artificial ILD based the angle of incidence, to obtain a monotonic ILD-versus-angle function. They found improved sound localization for real bimodal listeners. However, this strategy relied on a priori information about the angle of incidence of the incoming sound, and could not optimally handle multiple sound sources either. Brown (214) extended the strategy by estimating the angle of incidence in different frequency regions based on ITDs, resulting in an improved speech intelligibility for bilateral cochlear implant users. Moore et al. (216) evaluated a similar algorithm for bilateral hearing aid users, and found improved sound localization while speech perception was not significantly affected. Although the head shadow is essentially a monaural effect, all above-mentioned strategies start from a binaural perspective. This results in complex sound coding strategies that may be suboptimal or difficult to implement in current hearing aid or cochlear implant processors. Recently, Veugen et al. (217) evaluated a monaural strategy to improve access to high-frequency ILDs for bimodal listeners by applying frequency compression in the hearing aid. However, they did not find a significant improvement in sound localization. Moreover, frequency compression might result in undesired side-effects on speech intelligibility and sound quality (Simpson, 29). Once one recognizes the natural beamforming characteristic of an ear, an elegant way to optimize its behavior is the superposition of an electronic beamformer. We present a new method to artificially enhance the head shadow effect by supplying each ear with an electronic beamformer, focusing on sounds coming from the ipsilateral side. The method is able to enhance low frequency ILDs and resolve non-monotonic ILD-versus-angle functions. Therefore, we expect similar advantages in sound localization and speech intelligibility as those observed in above-cited papers for different hearing impaired populations, while our method does not require complex sound processing and naturally handles multiple sound sources. Since bimodal listeners have poor ITD perception and their residual hearing often has limited access to high frequencies, they have very poor localization performance(francart and McDermott, 213). Therefore, we validated the head shadow enhancement method in a localization experiment with simulated bimodal listeners. 2 of 9
2 2.1 Methods Head shadow enhancement We enhanced the directivity pattern of the ear with an end-fire delay-and-subtract directional microphone applied in low frequencies, as illustrated in Fig. 1(a). To achieve beamforming behavior towards the ipsilateral side in each ear, a linear microphone array in the lateral direction was realized with a link between two devices. In this way, the microphone spacing equals the distance between the ears, approximately 2 cm. A large microphone spacing yields good sensitivity of the directional microphone at low frequencies (note that a frontal directional microphone in a behind-the-ear (BTE) device is usually not active at low frequencies because of its strong high pass characteristic (Ricketts and Henry, 22)). On the other hand, this large spacing decreases the sensitivity at frequencies above approximately 8 Hz due to the comb filter behavior of a subtractive array (Dillon, 21, Chapter 7); this was no issue, since we only enhanced the head shadow for low frequencies. In Fig. 1(b) it can be seen that the method results in a cardioid-like directivity pattern for low frequencies, while the natural directivity pattern of the ear remains unchanged for frequencies above 8 Hz. Figure 1: (a) Low frequencies of the right ear signal are sent to the left ear device, followed by delay-and-subtract to obtain low-frequency ipsilateral directionality. The same method is applied in the right ear device (not shown in the figure). (b) The method results in a cardioid-like directivity pattern for low frequencies, while the natural directivity pattern of the ear remains unchanged for frequencies above 8 Hz. 2.2 Simulations of spatial hearing Spatial hearing was simulated with head-related transfer functions (HRTFs). We measured the response of an omni-directional microphone in a BTE piece placed on a CORTEX MK2 human-like acoustical manikin. The manikin was positioned in the center of a localization arc with radius of approximately 1 m, with 13 loudspeakers positioned at 3 of 9
angles between 9 (left) and+9 (right) insteps of15. Head shadow enhancement was done by applying the method on the omni-directional HRTFs to obtain enhanced HRTFs. 2.3 Simulation of bimodal hearing We simulated bimodal hearing according to the methods of Francart et al. (29). CI listening was simulated in the left ear with an eight-channel noise band vocoder to mimic the behavior of a CI processor: the input signal was sent through an eight-channel filter bank; within each channel, the envelope was detected with half-wave rectification followed by a 5 Hz low pass filter; this envelope was used to modulate a noise band of which the spectrum corresponded to the respective filter; the outputs of all channels were summed to obtain a single acoustic signal. Severe hearing loss was simulated in the right ear with a sixth order low pass Butterworth filter with a cutoff frequency of 5 Hz, such that the response rolled of at 36 db per octave. This corresponds with a ski-slope audiogram of a typical bimodal listener. In this simulation, no ITD cues could be used to localize sounds, which made our participants rely solely on ILD cues (Francart and McDermott, 213). 2.4 Participants We recruited 6 normal-hearing participants, aged between 24- and 26-years-old. The study was approved by the Medical Ethical Committee of the University Hospital Leuven (S5897). 2.5 Experimental set-up The participant was seated in the same localization arc as where the HRTFs were measured. The loudspeakers were labeled with numbers 1 13 corresponding to angles between 9 (left) and +9 (right) in steps of 15. The loudspeakers served solely as a visual cue. The stimuli were presented through Sennheiser HDA2 over-the-ear headphones via an RME Hammerfall DSP Multiface soundcard, using the software platform APEX 3 (Francart et al., 28). 2.6 Stimuli We presented the Dutch word zoem ["zum] from the Lilliput speech material(van Wieringen, 213), uttered by a female talker and completely voiced. To limit the use of on-/offset ITD cues, we ramped the on- and offset with a 5 ms cosine window. 2.7 Procedure Localization performance was measured separately in a condition with head shadow enhancement and a condition without head shadow enhancement; the order of conditions was randomized across subjects. Each condition consisted of a block of 7 runs. The first 4 of 9
4 runs served as training to get used to the simulation; only the last 3 runs were considered in our analysis. Each run consisted of 3 trials per angle, resulting in 39 trials in total per run; the order of trials was randomized in each run. The participant was instructed to look straight ahead during stimulus presentation. Feedback was always given after the response by turning on a light emitting diode above the correct speaker for 2 s. The stimulus was calibrated separately for each condition, such that a signal from the front ( ) was presented at 65 dba in each ear. To avoid the use of monaural level cues for localization, the overall level was randomly roved by ±1 db. 3 Results The broadband ILDs of the stimulus after bimodal simulation for angles between 9 and +9 with or without head shadow enhancement are shown in Fig. 2(a). Head shadow enhancement resulted in a steeper and monotonic ILD-versus-angle function. The results of our psychophysical experiment are shown in Fig. 2(b) and (c). In Fig. 2(b), the mean response averaged across trials as a function the presentation angle is plotted per condition. The mean response is a measure of the bias in the response for a certain condition: the closer the mean is to the diagonal, the smaller the bias in response. In Fig. 2(c), the standard deviation (s.d.) in response across trials as a function the presentation angle is plotted per condition. The s.d. is a measure of the variability in the response for a certain condition: the lower the s.d., the smaller the variability in the response, and thus the more certain the participants were about their response. Error bars always represent the standard deviation across subjects. Both the bias and variability contribute to the total root-mean-square (RMS) error per condition. It can be seen that head shadow enhancement reduces both the bias and variability in response. Both for the bias and variability, the largest improvement is for large angles, corresponding well with the ILD curves of Fig. 2(a). 5 of 9
(a) (b) (c) Omni-directional microphones Broadband ILD [dba] 2 1-1 -2-9 -45 45 9 Mean response [ ] 9 45-45 -9 mean RMS bias = 32.7-9 -45 45 9 Standard deviation on response [ ] 7 6 5 4 3 2 1 mean RMS s.d. = 4.9-9 -45 45 9 Head shadow enhancement Broadband ILD [dba] 2 1-1 -2-9 -45 45 9 Stimulus [ ] Mean response [ ] 9 45-45 -9 Standard deviation on response [ ] mean RMS bias = 14.3 mean RMS s.d. = 25.1-9 -45 45 9-9 -45 45 9 Stimulus [ ] Stimulus [ ] 7 6 5 4 3 2 1 Figure 2: Due to optimized interaural level differences (ILDs), head shadow enhancement significantly improved localization performance by 22.8 in RMS error. The total RMS error results from both the bias and the uncertainty in the responses. (a) Head shadow enhancement resulted in a steeper and monotonic ILD-versus-angle curve for the speech stimulus zoem in a bimodal simulation. (b) The mean response (averaged across trials) is a measure of the bias in the response: the closer to the diagonal (dashed line), the better the response. Head shadow enhancement decreases the bias especially for large angles, as can be expected from the interaural level difference (ILD) curves for our stimuli. Error bars represent the intersubject standard deviation. (c) The standard deviation (s.d.) in response (across trials) is a measure of how certain the listener is of his or her response: less uncertainty results in a smaller s.d.. Head shadow enhancement decreases the uncertainty for all angles, but especially for large angles. Error bars represent the inter-subject standard deviation. A Wilcoxon signed-rank test was performed to compare the RMS error averaged across all angles with condition as the independent variable. Head shadow enhancement significantly improved localization performance from a mean RMS error of 51.1 to a mean RMS error of 28.3, i.e., a mean improvement of 22.8 in RMS error (V = 21, p =.3, r =.62). 4 Discussion Head shadow enhancement yielded a steeper and monotonic ILD-versus-angle function, resulting in a large improvement in sound localization of 22.8 in RMS error. The error 6 of 9
(consisting of both the bias and uncertainty in response) was reduced especially at large angles, as could be expected from the ILD curves of our stimuli. In the condition without head shadow enhancement, we found a mean RMS error of 51.1, which is within the range reported for real bimodal listeners (Potts et al., 29; Ching et al., 27). This confirms the validity of our acoustic simulation of bimodal listening. Francart et al. (211) have indeed shown that their results for acoustic simulations of bimodal listening (Francart et al., 29) could be translated to real bimodal listeners. We expect similar improvements in localization for different populations, as ILD enhancement approaches have already been shown effective for real bimodal listeners(francart et al., 211) and bilateral hearing aid users (Moore et al., 216). Moore et al. (216) have even shown improvements in localization when ILD enhancement was combined with compressive gain. We also expect improvements in speech understanding in noisy environments for different populations. Firstly, when noise arrives from all directions, head shadow enhancement attenuates all noise from the contralateral side in each ear. This results in a bilateral improvement in signal-to-noise ratio. For example, with a speech source in front of the listener, we calculated a monaural speech-weighted signal-to-noise ratio (SNR) improvement of 14 db with noise from the contralateral side, compared with a small decrease of 1 db with noise from the ipsilateral side. The small frontal attenuation might be resolved by superposing frontal directionality on high frequencies in each ear. Secondly, each ear can focus more uniquely on a target in a multi-talker environment. This is especially useful for cochlear implant users, for whom it is very hard to disentangle separate speech sources. ILD enhancement has indeed been shown effective to increase speech intelligibility for bilateral CI users in a multi-talker environment (Brown, 214). Thirdly, van Hoesel (215) has shown that improved localization allows listeners to orient towards talkers and gain access to visual cues, resulting in even larger improvements in speech intelligibility. Overall, the method seems very promising to be implemented in future devices, both due to its simplicity and its effectiveness. In future research, we will validate our method for different populations and investigate its effect on speech intelligibility in noise. Acknowledgments This research is funded by the Research Foundation Flanders (SB PhD fellow at FWO); this research is jointly funded by Cochlear Ltd. and Flanders Innovation & Entrepreneurship (formerly IWT), project 15432; this project has also received funding from the European Research Council (ERC) under the European Union s Horizon 22 research and innovation programme(grant agreement No 637424, ERC starting Grant to Tom Francart). We thank our participants for their patience and enthusiasm during our experiment. 7 of 9
References Bronkhorst, A. and Plomp, R. (1988). The effect of head-induced interaural time and level differences on speech intelligibility in noise. The Journal of the Acoustical Society of America, 83(4):158 1516. Brown, C. A. (214). Binaural enhancement for bilateral cochlear implant users. Ear and hearing, 35(5):58. Ching, T., Van Wanrooy, E., and Dillon, H. (27). Binaural-bimodal fitting or bilateral implantation for managing severe to profound deafness: a review. Trends in amplification, 11(3):161 192. Dillon, H. (21). Hearing aids, volume 362. Boomerang press Sydney. Francart, T., Lenssen, A., and Wouters, J. (211). Enhancement of interaural level differences improves sound localization in bimodal hearing. The Journal of the Acoustical Society of America, 13(5):2817 2826. Francart, T. and McDermott, H. J. (213). Psychophysics, fitting, and signal processing for combined hearing aid and cochlear implant stimulation. Ear and hearing, 34(6):685 7. Francart, T., Van den Bogaert, T., Moonen, M., and Wouters, J. (29). Amplification of interaural level differences improves sound localization in acoustic simulations of bimodal hearing. The Journal of the Acoustical Society of America, 126(6):329 3213. Francart, T., Van Wieringen, A., and Wouters, J. (28). Apex 3: a multi-purpose test platform for auditory psychophysical experiments. Journal of Neuroscience Methods, 172(2):283 293. Francart, T., Wouters, J., and Van Dijk, B. (213). Localisation in a bilateral hearing device system. US Patent 8,53,74. Lopez-Poveda, E. A., Eustaquio-Martín, A., Stohl, J. S., Wolford, R. D., Schatzer, R., and Wilson, B. S. (216). A binaural cochlear implant sound coding strategy inspired by the contralateral medial olivocochlear reflex. Ear and hearing, 37(3):e138. Moore, B. C., Kolarik, A., Stone, M. A., andlee, Y.-W. (216). Evaluationofamethodfor enhancing interaural level differences at low frequencies. The Journal of the Acoustical Society of America, 14(4):2817 2828. Potts, L. G., Skinner, M. W., Litovsky, R. A., Strube, M. J., and Kuk, F. (29). Recognition and localization of speech by adult cochlear implant recipients wearing a digital hearing aid in the nonimplanted ear (bimodal hearing). Journal of the American Academy of Audiology, 2(6):353 373. 8 of 9
Rayleigh, L. (197). Xii. on our perception of sound direction. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 13(74):214 232. Ricketts, T. and Henry, P. (22). Low-frequency gain compensation in directional hearing aids. American Journal of Audiology, 11(1):29 41. Shaw, E. (1974). Transformation of sound pressure level from the free field to the eardrum in the horizontal plane. The Journal of the Acoustical Society of America, 56(6):1848 1861. Simpson, A. (29). Frequency-lowering devices for managing high-frequency hearing loss: A review. Trends in amplification, 13(2):87 16. van Hoesel, R. J. (215). Audio-visual speech intelligibility benefits with bilateral cochlear implants when talker location varies. Journal of the Association for Research in Otolaryngology, 16(2):39 315. Van Wieringen, A. (213). The lilliput, an open-set cvc test for assessing speech in noise in 4-6 yr olds. In second annual B-audio conference 15-16 November 213. Veugen, L. C., Chalupper, J., Mens, L. H., Snik, A. F., and van Opstal, A. J. (217). Effect of extreme adaptive frequency compression in bimodal listeners on sound localization and speech perception. Cochlear Implants International, pages 1 12. 9 of 9