Release from informational masking in a monaural competingspeech task with vocoded copies of the maskers presented contralaterally

Release from informational masking in a monaural competingspeech task with vocoded copies of the maskers presented contralaterally Joshua G. W. Bernstein a) National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland 20889 Nandini Iyer Air Force Research Laboratory, Wright Patterson Air Force Base, Ohio 45433 Douglas S. Brungart National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland 20889 (Received 18 March 2014; revised 17 December 2014; accepted 18 December 2014) Single-sided deafness prevents access to the binaural cues that help normal-hearing listeners extract target speech from competing voices. Little is known about how listeners with one normal-hearing ear might benefit from access to severely degraded audio signals that preserve only envelope information in the second ear. This study investigated whether vocoded masker-envelope information presented to one ear could improve performance for normal-hearing listeners in a multi-talker speech-identification task presented to the other ear. Target speech and speech or non-speech maskers were presented unprocessed to the left ear. The right ear received no signal, or either an unprocessed or eight-channel noise-vocoded copy of the maskers. Presenting the vocoded maskers contralaterally yielded significant masking release from same-gender speech maskers, albeit less than in the unprocessed case, but not from opposite-gender speech, stationary-noise, or modulatednoise maskers. Unmasking also occurred with as few as two vocoder channels and when an attenuated copy of the target signal was added to the maskers before vocoding. These data show that delivering masker-envelope information contralaterally generates masking release in situations where target-masker similarity impedes monaural speech-identification performance. By delivering speech-envelope information to a deaf ear, cochlear implants for single-sided deafness have the potential to produce a similar effect. [http://dx.doi.org/10.1121/1.4906167] [VB] Pages: 702 713 I. INTRODUCTION For normal-hearing (NH) listeners, binaural hearing provides substantial benefits in everyday environments when the source of interest is spatially separated from the locations of interfering sound sources. Having two ears allows the listener to take advantage of head-shadow effects by selectively attending to the ear that provides the best signalto-noise ratio (SNR) (i.e., the better-ear advantage) and provides access to binaural-difference cues that facilitate the perceptual segregation of spatially separated sources. While the better-ear advantage is accessible with monaural listening at any given point in time, the stream-segregation benefit requires the listener to combine concurrent information across the ears to differentiate the target and masker signals based on their differing spatial characteristics. Individuals who lack spatial hearing due to profound hearing loss in one ear (single-sided deafness; SSD) are therefore at a severe disadvantage for speech understanding in these types of environments. Recently, several studies have shown that a cochlear implant (CI) can be an effective treatment for SSD, a) Author to whom correspondence should be addressed. Electronic mail: joshua.g.bernstein.civ@mail.mil providing the patient with bilateral auditory input, and leading to improved speech perception in noise and sound localization ability (e.g., Vermeire and van de Heyning, 2009; Buechner et al., 2010; Arndt et al., 2011; Firszt et al., 2012; Hansen et al., 2013; Erbele et al., 2015). The observed improvements in speech perception in noise are generally consistent with a better-ear effect: advantages were observed only for those spatial arrangements of target and masker that yielded a better SNR at the CI ear. It is not known if a CI can provide binaural-difference cues to promote the perceptual separation of concurrent speech sounds. Inspired by this trend to provide CIs to individuals with SSD, the current study examined how the presentation of vocoded speech-envelope information to NH listeners with a simulated deaf ear could facilitate masked speech understanding. The process of vocoding a speech signal involves the division of the signal into individual frequency bands, the extraction of the signal envelope in each band, and the modulation of a set of corresponding narrowband noise or tonal carriers by the extracted envelopes (e.g., Shannon et al., 1995). Although vocoder processing is not considered to be an adequate simulation of CI listening, what CIs and vocoder processing share in common is that in both cases, speech-envelope information is delivered to the listener. 702 J. Acoust. Soc. Am. 137 (2), February 2015 0001-4966/2015/137(2)/702/12/$30.00

Thus, vocoder studies have been employed regularly in the literature as an important first step toward assessing the possibility that a CI might provide a benefit under certain conditions (e.g., Qin and Oxenham, 2006; Garadat et al., 2009; Goupell et al., 2013). The identification of a benefit for speech understanding in noise using vocoder processing would suggest that it would be worthwhile to examine the same paradigm in actual CI users. Several previous studies have used vocoder processing presented bilaterally to NH listeners to examine how speechenvelope information might facilitate masking release in competing-speech paradigms. These studies have had mixed conclusions regarding the question of whether listeners experience squelch i.e., a two-ear benefit beyond that associated with a better-snr head-shadow advantage for the perception of vocoded speech in noise. Garadat et al. (2009) measured speech-reception performance in the presence of interfering speech in a virtual auditory space, and found that listeners received as much as 3 db of squelch. Schoof et al. (2013) investigated a similar scenario, but with speech presented in 20-talker babble, and found that relative to the monaural condition, bilaterally vocoded signals provided only a better-ear head-shadow benefit, but no squelch. Best et al. (2012) investigated a scenario involving speech masked by spatially symmetric interfering-talker maskers. They identified a two-ear advantage that could not be ascribed to better-ear listening, at least according to the traditional definition of better ear in terms of the long-term average SNR, which was equal in the two ears. Because the interferers originating from either side of the head were uncorrelated, the additional advantage might nevertheless have derived from rapid changes in the ear providing the better SNR at different points in time and in different frequency bands (Brungart and Iyer, 2012). Thus, it is not clear if the additional benefit represented any two-ear advantage that could not be explained in terms of better-ear listening headshadow effects. While the evidence for binaural squelch with bilaterally vocoded signals is mixed, there is reason to believe that the addition of vocoded speech-envelope information to a second ear might be even more likely to facilitate masking release when an unprocessed mixture of target and masker signals, rather than vocoded signals, are presented to the first ear. In many situations, a listener will be able to hear each individual voice presented in an unprocessed monaural mixture, but performance will be limited by an inability to determine which portions of this mixture belong to the target of interest and which belong to the masker (e.g., Hornsby et al., 2006). In fact, some of the largest benefits of binaural listening have been observed in situations where the source of interest is masked by multiple people talking at once from different directions. With two available ears, this is a relatively easy task, because the difference in source locations allows the listener to perform stream segregation that is, to perceptually segregate the individual talkers into separate objects and pick out the source of interest (e.g., Freyman et al., 2001; Arbogast et al., 2002). But with only one available ear, this task becomes extremely difficult, especially in the case where all of the simultaneous talkers are of the same gender. In the absence of cues for spatial differentiation of the sources, the listener must rely on other weaker cues, such as voice pitch and voice quality, to perceptually pull apart the simultaneous sources. Because the task is thought to be mainly limited by difficulty in segregating concurrent speech sources that are all audible, this phenomenon has been termed informational masking, to differentiate it from the energetic masking associated with a noise masker that limits the audibility of the speech target (for a review, see Kidd et al., 2008). For NH listeners, spatial differences between target and masker in competing-talker informational-masking situations have been shown to lead to very large binaural advantages that are not ascribable to a betterear listening head-shadow advantage (e.g., Arbogast et al., 2002), because the informational masking is alleviated by spatial cues that allow the listener to segregate target and masker (Freyman et al., 2001). The goal of the current study was to test the hypothesis that the relatively crude acoustic cues available in the speech envelopes delivered to the one ear might provide enough information to facilitate the perceptual segregation of concurrent speech streams presented unprocessed to the other ear. Noise-vocoder processing was used to extract the envelopes from the masker signal and deliver these envelopes to the second ear by modulating bands of bandpass-filtered noise. The study was particularly focused on conditions where it is difficult to segregate simultaneous speech streams based on monaural cues alone. If the presentation of speech-envelope information to the second ear via vocoder processing can enhance speech-reception performance in the presence of interfering talkers, this would suggest that it would be worthwhile to examine whether a similar effect might also be observed for individuals with SSD who have received a CI in their deaf ear. Experiment 1 presented a mixture of unprocessed target speech and maskers to the left ear, and presented only the vocoded maskers to the right ear. In other words, the target was presented with an infinite interaural level difference (ILD) between the two ears, while the maskers were presented with an ILD of 0 db. Performance was then compared between the monaural condition (unprocessed ear only, target, and maskers) and the bilateral condition (unprocessed ear, target, and maskers; vocoded ear, maskers only) to determine whether listeners would experience masking release. Because none of the acoustic content of the target speech was presented to the vocoded ear, this arrangement did not produce any better-ear head-shadow benefit at the vocoded ear. Thus, any benefit observed by presenting the maskers to the vocoded ear could be ascribed to the listener combining information across the two ears. The Coordinate Response Measure (CRM) corpus (Bolia et al., 2000) with two same-gender interferers was selected because it has been shown to produce a great deal of informational masking under monaural conditions (Brungart, 2001). Using a similar paradigm presented to NH listeners, Gallun et al. (2005) showed that adding the masker signals to the contralateral ear greatly reduced informational masking relative to the monaural condition. J. Acoust. Soc. Am., Vol. 137, No. 2, February 2015 Bernstein et al.: Masking release in single-sided deafness 703

Experiment 2 investigated the effect of mixing the target speech in with the masker signals in the vocoded ear as a step toward a free-field situation where a target source located on one side of the head would be only partially attenuated at the contralateral ear. Experiment 3 employed a speech task involving word identification with unrelated maskers. This experiment explored whether binaural masking release with a vocoded second ear is limited to the difficult stream-segregation paradigm involving multiple samegender interfering talkers and the CRM corpus employed in experiment 1, or would be observed for a wider variety of conditions, including situations with same-gender interferers that are known to produce substantial informational masking, and situations with stationary or modulated noise or opposite-gender interfering talkers that produce less informational masking. Finally, experiment 4 varied the number of vocoder channels to investigate the impact that the fidelity of the vocoded signal had on the observed masking release. II. EXPERIMENT 1: TWO SAME-GENDER INTERFERERS A. Methods This experiment employed the CRM corpus to examine whether a single-sided noise vocoder could facilitate release from masking. CRM sentences are of the form Ready (call sign) go to (color) (number) now spoken by four different male and four different female talkers. Listeners were instructed to follow the speech of the talker who used the call sign baron, and identify the color and number spoken by this target talker, while ignoring two competing talkers using other call signs (e.g., eagle or arrow ). The response matrix consisted of a 4-by-8 array of virtual buttons, with each row representing a given color and each column representing a given number. On each trial, the target stimulus was randomly selected from among the eight possible talkers, eight possible numbers (1 8) and four possible colors (red, green, white, and blue). The two masker stimuli were then selected at random, with the constraints that all three signals (target and two maskers) were spoken by individuals of the same gender, but that each signal was spoken by a different individual, with a different number and color. The target and two masker signals were combined at the desired target-to-masker ratio (TMR), then presented unprocessed to the left ear. The target was always presented at a level of 60 db sound-pressure level (SPL). The reported TMR reflects the level of each masker signal relative to the 60-dB SPL target stimulus. For example, a TMR of 0 db indicates that the target signal and both masker signals were each adjusted to level of 60 db SPL before being mixed. The signals presented to the right ear depended on the condition that was being tested. In the NH/Deaf condition (simulating SSD with no device intervention), no signal was presented to the right ear. In the NH/NH condition, the unprocessed masker signals were presented to the right ear. In the NH/Voc condition, the masker signals presented to the right ear were processed by an eight-channel noise vocoder (Hopkins et al., 2008). Noise vocoding was applied by (1) passing the acoustic signal through a filterbank that separated the signal into eight frequency channels (center frequencies 100 10 000 Hz, with bandwidths proportional to the equivalent rectangular bandwidth (ERB) of a NH auditory filter, as defined by Glasberg and Moore, 1990), (2) extracting the Hilbert amplitude envelope of the signal in each channel, (3) using these envelopes to modulate independent white noise carriers, (4) passing the resulting signals through the original filterbank, (5) normalizing the level in each channel to equal the root-mean-squared level of the original unprocessed signal in that band, and (6) summing the signals across channels to generate the broadband vocoded sound. The Hilbert envelopes were not lowpass filtered at a fixed frequency. Instead, the upper limit of modulation frequencies presented in the vocoded signals was limited by the bandwidth of each channel. The masker signals were always presented to the right ear (unprocessed or vocoded) at the same overall level as the unprocessed masker signals that were mixed with the target in the left ear. Listeners were presented with blocks of 48 trials, consisting of 6 trials for each of 8 TMRs ( 20 to þ8dbin4-dbsteps)for a given test condition (NH/Deaf, NH/NH, or NH/Voc). The test conditions were presented in pseudo-random order, with one block presented for each of the three conditions before an additional block was presented for any of the conditions. Listeners were presented with 24 trials for each combination of TMR and presentation condition, for a total of 576 trials. The data were collected with the listeners seated in a sound-treated booth that was equipped with a control computer running MATLAB (Mathworks, Natick, MA). All stimuli were presented through an RME (Haimhausen, Germany) Hammerfall sound card connected to Beyerdynamic (Heilbronn, Germany) DT990 headphones. Listeners responded by clicking on a graphical user interface with a computer mouse. Nine NH listeners participated (4 female, age range 21 31 yrs, mean age 24.6 yrs). All participants in this and subsequent experiments passed an audiometric screening, with pure-tone thresholds better than or equal to 15 db hearing level in both ears at octave frequencies between 125 and 8000 Hz, plus 6000 Hz. Because all listeners had experience performing the call sign-based wordidentification task using the CRM corpus, no additional training was provided in the experiment. One of the listeners had exceptionally poor performance, achieving only 30% 40% correct performance in the NH/Deaf and NH/Voc conditions at the highest TMR tested (þ8 db). (All of the other listeners were able to achieve at least 60% correct in all three conditions at this TMR, and most scored between 80% and 100% correct.) This listener s data was dropped from the analysis and the data analysis was carried out for the remaining eight listeners, although the inclusion of this listener s data would not have affected the statistical results. B. Results Figure 1(A) plots mean performance (the proportion of trials where the target color and number were both identified correctly) as a function of TMR for the three conditions tested in the experiment. The curves in Fig. 1(A) are 704 J. Acoust. Soc. Am., Vol. 137, No. 2, February 2015 Bernstein et al.: Masking release in single-sided deafness

FIG. 1. Results of experiment 1 showing (A) mean performance in correctly identifying the color and number as a function of TMR and (B) the amount of masking release for individual listeners and for the group mean, derived by comparing the estimated TMR required for 50% correct performance in the NH/ Voc or NH/NH conditions to that required in the NH/Deaf condition. The target and maskers (two same-gender interfering talkers) were presented to the left ear. The right ear was presented with silence (NH/Deaf), the unprocessed maskers (NH/NH), or the vocoded maskers (NH/Voc). Error bars indicate 6 one standard error of the mean across listeners. sigmoidal curves fit to the data for each condition. There was a clear advantage for the binaural NH condition (NH/ NH, gray diamonds) over the monaural NH condition (NH/ Deaf; white circles), as expected (Gallun et al., 2005). Presenting the right ear with the vocoded masking signals (NH/Voc; black squares) also produced substantial masking release relative to the NH/Deaf condition (white circles), although not nearly as large as that observed for the NH/NH condition (gray diamonds). The data were analyzed using a repeated-measures binary-logistic regression analysis with within-subjects factors condition (NH/Deaf, NH/Voc, or NH/NH) and TMR. There were significant main effects of condition [v 2 (2) ¼ 140.2, p < 0.0005] and TMR [v 2 (7) ¼ 20 940, p < 0.0005]. There was also a significant interaction between the two variables [v 2 (8) ¼ 330, p < 0.0005], reflecting the lack of difference in performance between conditions at the extremes of the psychometric functions. Three separate logistic-regression analyses were conducted to compare performance for each of the three conditions pairwise, with p-values Bonferoni-corrected for (three) multiple comparisons. In each case, the main effects of condition (p < 0.05) and TMR (p < 0.005), and the interaction between the two factors (p < 0.005) were found to be significant. To quantify the magnitude of the binaural benefit in db terms, speech-reception thresholds (SRTs) were estimated for each listener and condition by fitting a sigmoid function to the percentage-correct data and estimating the TMR required for 50% correct performance. The magnitude of the masking release was computed by subtracting the SRT for each bilateral condition (NH/NH or NH/Voc) from the SRT for the monaural condition (NH/Deaf). Masking release for each of the eight listeners and the group mean are shown in Fig. 1(B). Alleight listeners showed masking release for the NH/NH condition, while six out of eight listeners showed masking release for the NH/Voc condition. On average, presenting the vocoded masker signals to the right ear (NH/Voc) provided about 2.5 db of masking release relative to the NH/Deaf condition, while presenting the unprocessed signals to the right ear (NH/NH) provided about 10.5 db of masking release. III. EXPERIMENT 2: MIXING THE TARGET AND MASKERS IN THE RIGHT EAR A. Rationale In experiment 1, the target sound was completely absent from the right ear, a situation that would never occur naturally. In the free field, an acoustic signal originating on one side of the head can be attenuated by 5 20 db at the contralateral ear by head shadow (Bronkhorst and Plomp, 1988), but is never completely absent. Experiment 2 investigated whether the masking release observed in the NH/Voc condition in experiment 1 would still be observed if the target was mixed in with the maskers at the right ear, and if so, whether the effect is observed for a physiologically plausible range of target ILDs. B. Methods This experiment used the same basic methodology as experiment 1 except that the target signal was mixed in with the maskers in the right ear. As in experiment 1, the target and two same-gender maskers were mixed and presented unprocessed to the left ear. Only a TMR of 0 db was tested, with the unprocessed target and each masker presented at the same level (60 db SPL) to the left ear. The right ear was presented with no signals (NH/Deaf) or with a mixture of target and masker signals, either left unprocessed (NH/NH) or processed with an eight-channel noise vocoder (NH/Voc). In the NH/NH and NH/Voc conditions, the maskers presented to the right ear were presented at the same level (60 db SPL) as the maskers presented to the left ear, yielding a masker ILD of 0 db. The target ILD was adjusted as an experimental parameter by reducing the level of the target presented to the right ear relative to the target level in the left ear. Eleven target ILDs were tested (0, 1, 2, 3, 4, 6, 8, 12, 16, and 20 db, and Infinite). In the Infinite ILD condition, no target signal was mixed in with the maskers presented to the right ear, as was done in experiment 1. Listeners were presented with blocks of 60 trials, consisting of 5 trials for each of the 11 target ILDs for a given J. Acoust. Soc. Am., Vol. 137, No. 2, February 2015 Bernstein et al.: Masking release in single-sided deafness 705

test condition (NH/NH or NH/Voc) and 5 trials for the NH/ Deaf condition, presented in random order. Test blocks were presented in pseudo-random order, with one block presented for both the NH/NH and NH/Voc conditions before an additional block was presented for either of the conditions. Listeners were presented with a total of five blocks each for the NH/NH and NH/Voc conditions. This yielded 25 trials for each target ILD for the NH/NH and NH/Voc conditions, and 50 trials for the NH/Deaf condition, for a total of 600 trials presented to each listener in the experiment. Nine NH listeners participated (four female, age range 21 32 yrs, mean age 25.3 yrs). Six of these listeners had participated in experiment 1. C. Results Figure 2(A) plots performance (the proportion of trials where the color and number were both identified correctly) as a function of the target ILD. The NH/NH condition where the unprocessed target and masker signals were presented to the right ear is represented by black diamonds. FIG. 2. Results of experiment 2 showing mean performance in correctly identifying the color and number as a function of the target ILD. The target talker was presented with two simultaneous same-gender interferers. The masker ILD was always fixed at 0 db, and the level of the target stimulus in the left (unprocessed) ear was equal to the level of the maskers (i.e., TMR ¼ 0 db). The level of the target presented to the right (vocoded) ear was adjusted to yield the desired target ILD. (A) The top panel plots the mean raw data; (B) the bottom panel plots the same mean data normalized to set performance in the monaural (NH/Deaf) condition equal to zero and performance in the Infinite ILD (i.e., no target presented to the vocoder ear) equal to one. Error bars indicate 6 one standard error of the mean across listeners. Asterisks in (A) represent conditions where performance was significantly better than performance for a target ILD of 0 db (p < 0.05). The NH/Voc condition where the target and masker signals presented to the right ear were vocoded is represented by gray squares. Performance for the NH/Deaf condition for which no signals were presented to the right ear is plotted to the left of the other data (open circle). The curves in Fig. 2(A) are sigmoidal fits to the data for each condition, with the minimum performance for each sigmoid fit fixed at a value equal to performance in the NH/Deaf condition. The mean data in Fig. 2(A) were normalized based on the fitted sigmoidal functions and replotted in Fig. 2(B). This view of the data illustrates how masking release depended on the target ILD as a proportion of the masking release observed for an infinite target ILD in each condition. The raw data [Fig. 2(A)] were analyzed using a repeated-measures binary-logistic regression analysis, with within-subjects factors condition (NH/Voc or NH/NH) and target ILD. As expected, there was a significant main effect of target ILD [v 2 (8) ¼ 1446, p < 0.0005], with performance improving with increasing target ILD in both conditions as the ILD for the target signals became increasingly different from the 0-dB masker ILD. Furthermore, there was a significant main effect of condition [v 2 (1) ¼ 199, p < 0.0005], with the magnitude of the masking release larger for the NH/NH condition than for the NH/Voc condition. There was also a significant interaction between the two factors [v 2 (8) ¼ 1310, p < 0.0005]. This interaction is apparent in the raw data plotted in Fig. 2(A): There was no masking release for either condition for small target ILDs near 0 db, and different magnitudes of masking release for the two conditions for large target ILDs. The key questions posed in this experiment were concerned with the range of target ILDs for which masking release would be observed, and whether this range of target ILDs would be different in the two conditions. Plannedcomparisons were conducted on the raw data [Fig. 2(A)] to determine the minimum target ILD required to produce masking release by examining which target ILDs yielded significantly better performance than the 0-dB ILD condition [p < 0.05, asterisks in Fig. 2(A)]. For the NH/NH condition, significant masking release was observed for a target ILD as small as 2 db [v 2 (1) ¼ 37.9, p < 0.0005], and increased until a maximum plateau was reached at a target ILD of 8 db. For the NH/Voc conditions, masking release was observed for a target ILD greater than or equal to 6 db [v 2 (1) ¼ 10.6, p < 0.005], and increased until a maximum plateau was reached for a target ILD of 12 db. Thus, approximately 3 times the ILD was required to generate significant masking release for the NH/Voc condition (6 db) than was required for the NH/NH condition (2 db). IV. EXPERIMENT 3: EFFECT OF MASKER TYPE A. Rationale Experiments 1 and 2 only examined the situation where the target talker was masked by two same-gender interfering talkers. Experiment 3 examined a wider variety of maskers to determine the conditions under which masking release would be observed when the vocoded maskers were presented to the right ear, and to shed light on the mechanisms 706 J. Acoust. Soc. Am., Vol. 137, No. 2, February 2015 Bernstein et al.: Masking release in single-sided deafness

underlying the effect. In particular, the experiment was designed to differentiate between three possible types of binaural advantage: release from energetic, modulation, or informational masking. The first possible source of binaural advantage from presenting the vocoded masker signals to the contralateral ear is that the interaural correlation (q) between the masker signals in the two ears (q ¼ 1 in the NH/NH case) could generate a release from energetic masking. This is analogous to the classic binaural unmasking paradigm for tone detection in noise (Hirsh, 1948; Jeffress et al., 1956), whereby a monaural tone presented in a monaural noise (N m S m ) is unmasked by the addition of an interaurally correlated noise signal presented to the contralateral ear (N 0 S m ). A similar binaural unmasking effect has been observed for speech presented in any masking background including stationary noise (e.g., Carhart et al., 1969). (Note that although energetic masking is thought to arise mainly in the periphery, this masking can be reduced by cues arising from binaural interactions in the midbrain.) According to Moore (2013), release from masking for the detection of a 500-Hz tone in wideband noise is approximately 9 db in the N 0 S m condition (relative to the N 0 S 0 ) condition. Levitt and Rabiner (1967) reported that the binaural advantage for the intelligibility of speech in noise was generally about half the binaural advantage for the detection of speech in noise. Thus, one would infer that binaural unmasking effects can account for about 4 5 db of the release from masking that occurs when an unprocessed copy of the masker is added to the contralateral ear in a monaural speech-perception task. To measure the possible contribution of an interaural correlation-based release from energetic masking, experiment 3 included a masking condition involving stationary noise, a condition generally thought to involve mainly energetic masking (but see Stone et al., 2012, for a discussion of whether stationary noise produces energetic or modulation masking). A second possibility is that the presentation of the vocoded maskers to the contralateral ear might generate a release from modulation masking. Modulation masking can arise when the modulations present in the masker stimulus interfere with the listener s ability to extract critical modulations in the target speech that relay speech information (e.g., Jørgensen and Dau, 2011; Stone et al., 2012). In this case, masking release might be facilitated by a difference in interaural correlation between the envelopes of the masker signals presented to the two ears and the envelope of the target signal presented monaurally. van de Par and Kohlrausch (1998) showed that interaural correlation in the envelope of a modulated noise masker can reduce the threshold of detection for a pure tone. We hypothesized that a similar effect might occur for speech signals, whereby interaural correlation in the masker envelope could lead to a reduction in modulation masking of the target speech. Modulated-noise maskers, which produce a combination of energetic and modulation masking, were included in experiment 3 to test for the possibility of modulation masking release due to interaural envelope co-modulation. If masking release was observed for a modulated masker but not for a stationarynoise masker, this would suggest that the contralateral presentation of the masker signals generated a release from modulation masking. A third possibility is that the presentation of the vocoded maskers to the contralateral ear might generate a release from informational masking. The idea was that in certain situations the target signal might be perfectly audible, but the listener experiences difficulty discerning which portions of the audible speech sounds are attributable to the target and which are attributable to the masker. The relatively weak interaural correlation between the unprocessed masker signal in one ear and the vocoded masker signal in the other might be sufficient to generate a release from informational masking, but too weak to generate a release from energetic or modulation masking as described above. To investigate this possibility, experiment 3 examined a range of masker conditions. This included conditions involving same-gender interfering talkers that produced a combination of energetic, modulation, and informational masking, and conditions thought to produce less or no informational masking, including opposite-gender interfering talkers or modulated noise. The idea was that a release from informational masking would be revealed by comparing the results for conditions involving a great deal of informational masking (in addition to energetic and modulation masking) to those for conditions involving mainly only energetic and modulation masking. For example, if masking release was observed only for same-gender interferers and not for opposite-gender interferers, this would imply that the contralateral presentation of the masker signals improved performance mainly by reducing informational masking. One additional question addressed by experiment 3 was whether the masking release observed in the previous two experiments was specific to the CRM-based speech-identification task involving the simultaneous presentation of multiple speech streams that follow exactly the same sentence structure. Experiment 3 used a different speech task involving the identification of words embedded in a carrier phrase, in the presence of interfering talkers reading a story that was unrelated to the task at hand. This allowed us to ask whether the masking release observed for the NH/Voc condition is specific to the CRM corpus, or whether it generalizes to somewhat more natural conditions involving interfering talkers that do not follow the same sentence structure as the target speech. B. Methods The speech corpus was a set of 144 one-syllable words taken from the modified rhyme test (House et al., 1965) spoken by three different male talkers, as described by Bernstein and Brungart (2011). The response set consisted of 24 words for each of six vowel contexts. Response choices were arrayed in a 12 12 matrix of virtual buttons on the computer screen. Each pair of rows contained the 24 words for one vowel context, arranged in alphabetical order. The words were embedded in a carrier phrase You will mark, please. The target talker was chosen at random for each stimulus trial. Listeners responded by pressing a virtual button associated with the word that they heard. Feedback was given following each trial by highlighting the button associated with the correct response. J. Acoust. Soc. Am., Vol. 137, No. 2, February 2015 Bernstein et al.: Masking release in single-sided deafness 707

Eight masker conditions were tested. There were four interfering-talker conditions, involving one or two interfering talkers of the same (male) or opposite (female) gender as the target speech. In these conditions, one or two segments of a story ( The Unfruitful Tree by Friedrich Adolph Krummacher, translated from German) were selected from a long-duration signal recorded by a single male or female talker. In conditions involving two interfering talkers, the two masker segments were chosen from different points within the story. Silent gaps were removed from the long-duration masker signals by calculating the amplitude of the signal with a 30-ms moving-average window and deleting time segments with a level remaining at least 20 db below the long-term root-mean-squared level of the speech for more than 150 ms. There was one stationary-noise masking condition, where the spectrum of a Gaussian noise was shaped to match the long-term average spectrum of the target speech. There were three modulated-noise conditions: 4- and 16-Hz sinusoidally amplitude-modulated (SAM) noise and a one-talker speech-modulated noise. In the speech-modulated noise condition, the stationary noise was modulated in two bands (above and below 1 khz; Festen and Plomp, 1990) by the envelope extracted from the female masker via half-wave rectification and lowpass filtering (fourth-order Butterworth, 40- Hz cutoff). In the SAM conditions, full-depth modulation at the appropriate rate was applied to the stationary noise. Word-identification performance was measured for five SNRs ( 24 to 0 db in 6-dB steps) for each masker condition and ear configuration (NH/NH, NH/Voc, NH/Deaf). Note that in this experiment, the SNR was defined relative to the total masker energy, in contrast to experiments 1 and 2 where the TMR was defined relative to the level of each of the two interfering talkers. Thus, a condition involving two interfering talkers each presented at the same level as the target that was defined as a TMR of 0 db in experiments 1 and 2 was defined as an SNR of 3 db in experiment 3. The target signal was presented only to the left ear, and the masker signals were presented either to the left or to both ears, with vocoder processing applied where applicable, exactly as described in experiment 1. Each test run consisted of 35 trials (seven trials for each of the five SNRs) for a given masker condition and ear configuration. Test runs were presented in pseudo-random order, with one run completed for each masker-configuration combination before an additional run was initiated for any given combination. Seven complete test runs, plus one partial run, were presented to each listener for each combination of masker condition and ear configuration, for a total of 54 trials for each combination of SNR, masker, and ear configuration, and 6480 total trials overall for each listener. Ten NH listeners participated (five female, age range 21 32 yrs, mean age 25.4 yrs), nine of whom had participated in at least one of the previous experiments. All of the listeners were presented with a training session before data collection began, where one speech token for each of the 144 choices in the response set was presented in quiet. C. Results The goal of this experiment was to identify the masking conditions for which the presentation of unprocessed (NH/NH) or vocoded (NH/Voc) masker signals to the right ear provided binaural masking release. Figure 3 shows word-identification performance plotted as a function of SNR for each masker condition (panels) and ear configuration (symbols). The curves in each panel of Fig. 3 represent sigmoidal fits to the data. Figure 4 shows masking release for the NH/Voc and NH/NH conditions, calculated as the difference (in db) between the SRTs (the SNR required for 50% correct performance) for the bilateral (NH/NH or NH/Voc) and monaural (NH/Deaf) conditions. Figures 3 and 4 show that masking release varied across masker type and processing condition. An analysis of variance (ANOVA) carried out with the SRT data for all 8 masker types and 3 stimulus conditions (NH/Deaf, NH/Voc, and NH/ NH) showed significant main effects of masker [F(7,63) ¼ 163, p < 0.0005], condition [F(2,18) ¼ 258, p < 0.0005], and a significant interaction between the two factors [F(14 126) ¼ 8.74, p < 0.0005]. Three separate ANOVAs were then conducted to compare the three stimulus conditions pairwise. For the comparisons between the NH/NH and NH/ Deaf conditions, and between the NH/Voc and NH/Deaf condition, the main effects of masker and condition, and the interaction between the two factors, were all found to be significant (p < 0.0005). This is consistent with the trend observed in Fig. 4, whereby the amount of masking release afforded by presenting the masker signals to the opposite ear varied as a function of masker type, for both the processed and unprocessed maskers. For the comparison between the NH/NH and NH/Voc conditions, the main effects of masker type and stimulus condition were found to be significant (p < 0.0005), but the interaction between these two factors was not (p ¼ 0.60). The lack of interaction between masker type and stimulus condition for this comparison can be seen in Fig. 4: the amount of masking release was consistently about 3 db greater for the NH/NH condition than for the NH/Voc condition across all eight masker types (range: 2.3 3.8 db). Post hoc t-tests were conducted for each of the eight masker types and two stimulus conditions (NH/NH and NH/ Voc) shown in Fig. 4 to determine for which conditions significant masking release was observed. P-values were Bonferoni-adjusted for eight multiple comparisons. Conditions with significant masking release (p < 0.05) are indicated by asterisks in Fig. 4. Masking release was observed for all eight masker types for the NH/NH condition (p < 0.005). For the NH/Voc condition, significant masking release was observed for speech-modulated noise (p < 0.005), and for one or two same-gender interfering talkers (p < 0.0005), although the magnitude of masking release in the speechmodulated noise case was relatively small (<2dB). V. EXPERIMENT 4: EFFECT OF THE NUMBER OF VOCODER CHANNELS A. Rationale The first three experiments used an eight-channel noise vocoder to process the signals presented to the right ear. Experiment 4 varied the number of vocoder channels in the simulation to determine how frequency selective the envelope processing would need to be to generate masking release. 708 J. Acoust. Soc. Am., Vol. 137, No. 2, February 2015 Bernstein et al.: Masking release in single-sided deafness

FIG. 3. Results of experiment 3 plotting mean performance in correctly identifying the target single-syllable word as a function of the SNR. The target and maskers were presented to the left ear. The right ear was presented with silence (NH/Deaf), the unprocessed maskers (NH/NH), or the vocoded maskers (NH/Voc). Each panel represents a different masker type. Error bars indicate 6 one standard error of the mean across listeners. B. Methods This experiment used the same word-identification paradigm with male talkers as experiment 3. Only the masking condition with two same-gender (i.e., male) interfering talkers was tested, at an SNR of 12 db. In addition to the NH/ Deaf and NH/NH conditions repeated from experiment 3, 10 NH/Voc conditions were tested, with the 100 10 000 Hz frequency range divided into 1, 2, 3, 4, 6, 8, 12, 16, 24, or 32 channels. Vocoder-channel bandwidths were proportional to the ERB of a normal auditory filter (Glasberg and Moore, 1990). Eight NH listeners participated (four female, age range 21 31 yrs, mean age 24.6 yrs), all of whom had participated in one or more of the previous three experiments. FIG. 4. The amount of masking release (in db) calculated as the horizontal distance between the fitted performance functions shown in Fig. 3 at the 50% level of performance (i.e., the SRT). Masking release for the NH/NH condition reflects the SRT difference between the NH/NH and NH/Deaf conditions. Masking release for the NH/Voc condition reflects the SRT difference between the NH/Voc and NH/Deaf conditions. Error bars indicate 6 one standard error of the mean masking release across listeners. Asterisks indicate conditions where masking release was significantly greater than zero (p < 0.05). C. Results Figure 5 plots percentage-correct word-identification performance as a function of the number of vocoder channels (gray squares). Performance for the NH/Deaf (white circle) and NH/NH (black diamonds) conditions are also shown. Performance generally increased with increasing number of vocoder channels. To determine how many vocoder channels were required to yield significant masking release relative to the NH/Deaf condition, planned comparisons of performance for each vocoder condition (and the NH/NH condition) relative to the NH/Deaf condition were performed using a repeated-measures logistic-regression analysis. Conditions that showed a significantly better performance than the NH/Deaf condition are identified by asterisks in Fig. 5. While a one-channel vocoder did not generate significant masking release (p ¼ 0.73), a significant increase in performance was observed with as few as two vocoder channels (p < 0.05). Performance then increased with an increasing number of vocoder channels until a maximum performance plateau was reached with six vocoder channels. Additional vocoder channels (beyond six) did not produce any additional masking release. J. Acoust. Soc. Am., Vol. 137, No. 2, February 2015 Bernstein et al.: Masking release in single-sided deafness 709

VI. DISCUSSION The results of the four experiments described here show that NH listeners presented with an unprocessed mixture of target and masker speech signals in one ear experienced masking release when the masker envelopes were presented to the other ear via vocoder processing. This effect was initially identified in a competing-talker paradigm where the target and masking talkers produced sentences of the same form from the same stimulus response set, and an infinite ILD was applied to the target signal (experiment 1). Followup experiments then extended this finding to conditions involving non-infinite target ILDs (experiment 2), a speech test where the target is unrelated to the masker speech (experiment 3), and a range of vocoder channel-number conditions (experiment 4). A. Release from energetic, modulation, or informational masking? In experiments 1, 3, and 4, the experimental paradigm was specifically designed such that the added vocoder signals presented to the second ear did not contain any acoustic information regarding the target speech. This ruled out any possibility that the observed masking release could be ascribed to a better-ear head-shadow advantage at the vocoder ear. Instead, the advantage must be ascribed to the listener combining the acoustic information across the ears in some way to release masking. In the following, results are compared for the NH/NH and NH/Voc conditions and across the various masker types in experiments 1 and 3 to address the question of whether the vocoded masker signals provided a release from energetic, modulation, or informational masking. FIG. 5. Results of experiment 4. The target and two same-gender interfering talkers were presented unprocessed to the left ear. The same masker signals were presented to the right ear after being processed by a noise vocoder. Performance in correctly identifying the target word is plotted as a function of the number of vocoder channels. Error bars indicate 6 one standard error of the mean performance across listeners. Asterisks indicate conditions where performance was significantly better (p < 0.05) than for the NH/Deaf condition where the right ear was presented with silence. It is clear that the masking release for the NH/Voc condition was not as robust as that observed as for bilaterally unprocessed signals (i.e., the NH/NH condition), and does not reflect the complete set of binaural interactions that are available in the bilaterally unprocessed case. In particular, it has been well established that NH listeners presented with bilaterally unprocessed signals can take advantage of binaural differences in stimulus phase to reduce energetic masking and improve speech understanding in the presence of noise. This effect was clearly evident in the NH/NH condition, where a binaural advantage was observed even in stationary noise [Fig. 2(A)]. In contrast, the presentation of the maskers to the vocoded ear in the NH/Voc condition did not produce any advantage in the stationary-noise condition. The lack of an advantage in the NH/Voc condition for this situation involving mainly energetic masking was not surprising, since the temporal fine structure of the noise masker was uncorrelated across the two ears as a result of the vocoder processing. There is some reason to believe that the NH/Voc condition might have been able to generate a release from modulation masking based on co-modulation in the masker envelopes delivered to the two ears. For example, van de Par and Kohlrausch (1998) showed that listeners were able to take advantage of binaural co-modulation in the masker envelope to detect a monaural target tone. However, there was little evidence of a release from modulation masking in the NH/Voc condition, with masking release mainly observed for masking conditions involving competing talkers, but not for the other modulated-masker conditions. There was, however, a small but significant amount of masking release for the speech-modulated noise masker in the NH/Voc condition, and a corresponding increase in masking release for the speech-modulated noise masker relative to the other masker types in the NH/NH condition, which could be interpreted as an indication of a small release from modulation masking for this condition. The results of experiment 3 are most consistent with the idea that the observed masking release in the NH/Voc condition was attributable to listeners combining the information across the two ears to generate a release from informational masking, with masking release observed mainly for the same-gender interfering-talker conditions that are known to generate substantial informational masking (e.g., Freyman et al., 2001). These are the same conditions that produced the most binaural benefit for the NH/NH conditions (Fig. 4, black bars). In these informational-masking situations, the target speech signal was likely to be audible to the listener, but the listener would have had a difficult time determining which of the audible components of the overall signal belong to the target, and which belong to the masker. This large amount of masking release for same-gender interferers is thought to occur because there are relatively few monaural cues available to allow the listener to perceptually segregate the simultaneous voices (e.g., Brungart, 2001; Brungart et al., 2001). In the current study, presenting the interferers to the right ear allowed the listener to more easily differentiate the target signal from the maskers, yielding a large masking release. Since there were few salient monaural cues 710 J. Acoust. Soc. Am., Vol. 137, No. 2, February 2015 Bernstein et al.: Masking release in single-sided deafness