Proceedings of Meetings on Acoustics

Similar documents
Speech perception in individuals with dementia of the Alzheimer s type (DAT) Mitchell S. Sommers Department of Psychology Washington University

HCS 7367 Speech Perception

Proceedings of Meetings on Acoustics

What can pupillometry tell us about listening effort and fatigue? Ronan McGarrigle

Phonak Field Study News

Role of F0 differences in source segregation

Assessing Hearing and Speech Recognition

Binaural processing of complex stimuli

Critical Review: What are the objective and subjective outcomes of fitting a conventional hearing aid to children with unilateral hearing impairment?

Sound localization psychophysics

PERCEPTION OF UNATTENDED SPEECH. University of Sussex Falmer, Brighton, BN1 9QG, UK

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking

Speech Reading Training and Audio-Visual Integration in a Child with Autism Spectrum Disorder

Influence of acoustic complexity on spatial release from masking and lateralization

UK Biobank. Hearing Speech-in-Noise Test. Version August 2012

The role of periodicity in the perception of masked speech with simulated and real cochlear implants

***This is a self-archiving copy and does not fully replicate the published version*** Auditory Temporal Processes in the Elderly

SPEECH PERCEPTION IN A 3-D WORLD

Evaluation of a Danish speech corpus for assessment of spatial unmasking

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

Norms for the Reading Span Test: English Version

Aging and Hearing Loss: Why does it Matter?

Language Volunteer Guide

HCS 7367 Speech Perception

An Introduction to the CBS Health Cognitive Assessment

The Benefits and Challenges of Amplification in Classrooms.

ReSound NoiseTracker II

INTERLINGUAL SUBTITLES AND SDH IN HBBTV

CANTAB Test descriptions by function

Director of Testing and Disability Services Phone: (706) Fax: (706) E Mail:

Strategic Promotion of Ageing Research Capacity

Variation in spectral-shape discrimination weighting functions at different stimulus levels and signal strengths

Auditory Scene Analysis

THE ROLE OF VISUAL SPEECH CUES IN THE AUDITORY PERCEPTION OF SYNTHETIC STIMULI BY CHILDREN USING A COCHLEAR IMPLANT AND CHILDREN WITH NORMAL HEARING

Unit III Verbal and Non-verbal Communication

PSYCHOMETRIC VALIDATION OF SPEECH PERCEPTION IN NOISE TEST MATERIAL IN ODIA (SPINTO)

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics

Providing Equally Effective Communication

TIPS FOR TEACHING A STUDENT WHO IS DEAF/HARD OF HEARING

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag

JOHN C. THORNE, PHD, CCC-SLP UNIVERSITY OF WASHINGTON DEPARTMENT OF SPEECH & HEARING SCIENCE FETAL ALCOHOL SYNDROME DIAGNOSTIC AND PREVENTION NETWORK

8.0 Guidance for Working with Deaf or Hard Of Hearing Students

Source and Description Category of Practice Level of CI User How to Use Additional Information. Intermediate- Advanced. Beginner- Advanced

SENSATION AND PERCEPTION KEY TERMS

Oticon Agil success 2011 An International Survey

Spatial processing in adults with hearing loss

What s New in Primus

Lightened Dream. Quick Start Guide Lightened Dream is a dream journal designed to wake you up in your dreams.

Effects of partial masking for vehicle sounds

In this chapter, you will learn about the requirements of Title II of the ADA for effective communication. Questions answered include:

Video Captioning Workflow and Style Guide Overview

Auditory and Auditory-Visual Lombard Speech Perception by Younger and Older Adults

The Good, the Bad and the Beautiful: Teaching with Digital Media

Asynchronous glimpsing of speech: Spread of masking and task set-size

Meeting someone with disabilities etiquette

The power to connect us ALL.

(SAT). d) inhibiting automatized responses.

Table of Contents Morning Set-up (GSI equipment, only)... 2 Opening AudBase... 3 Choosing a patient... 3 Performing Pure-Tone Air & Bone

Coach Morse - Morse Code Practice Unit

Borders College BSL Action Plan

Lateralized speech perception in normal-hearing and hearing-impaired listeners and its relationship to temporal processing

Psychometric Properties of the Mean Opinion Scale

Skill Council for Persons with Disability Expository for Speech and Hearing Impairment E004

easy read Your rights under THE accessible InformatioN STandard

Clinical Applicability of Adaptive Speech Testing:

Communicating with hearing aid users. Advice on contributing to successful communication

Topics in Amplification DYNAMIC NOISE MANAGEMENT TM A WINNING TEAM

Decision Tool: Options in the Hearing Clinic

Blast Searcher Formative Evaluation. March 02, Adam Klinger and Josh Gutwill

Measuring Working Memory in Audiologic Evaluations. Disclaimers. Listeners with Hearing Loss 10/4/2017

Reference: Mark S. Sanders and Ernest J. McCormick. Human Factors Engineering and Design. McGRAW-HILL, 7 TH Edition. NOISE

Speak Out! Sam Trychin, Ph.D. Copyright 1990, Revised Edition, Another Book in the Living With Hearing Loss series

The functional importance of age-related differences in temporal processing

EXECUTIVE SUMMARY Academic in Confidence data removed

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES

Spatial unmasking in aided hearing-impaired listeners and the need for training

Sensation is the conscious experience associated with an environmental stimulus. It is the acquisition of raw information by the body s sense organs

AA-M1C1. Audiometer. Space saving is achieved with a width of about 350 mm. Audiogram can be printed by built-in printer

Online hearing test Lullenstyd Audiology :

Making Sure People with Communication Disabilities Get the Message

Auditory Processing. Teach Inclusive Chapter 4, March Catherine Silcock, Speech Pathologist

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Proceedings of Meetings on Acoustics

Hearing Research 323 (2015) 81e90. Contents lists available at ScienceDirect. Hearing Research. journal homepage:

EFFECTS OF TEMPORAL FINE STRUCTURE ON THE LOCALIZATION OF BROADBAND SOUNDS: POTENTIAL IMPLICATIONS FOR THE DESIGN OF SPATIAL AUDIO DISPLAYS

Congruency Effects with Dynamic Auditory Stimuli: Design Implications

Perceptual Effects of Nasal Cue Modification

Your Guide to Hearing

Short article Detecting objects is easier than categorizing them

General Soundtrack Analysis

Solutions for better hearing. audifon innovative, creative, inspired

Revealing The Brain s Hidden Potential: Cognitive Training & Neurocognitive Plasticity. Introduction

Overview. Each session is presented with: Text and graphics to summarize the main points Narration, video and audio Reference documents Quiz

Communication Tips for Serving Individuals With Dementia. Begin

Intelligibility of clear speech at normal rates for older adults with hearing loss

Closing a gap to normal hearing

Hearing in Noise Test in Subjects With Conductive Hearing Loss

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

Transcription:

Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Psychological and Physiological Acoustics Session 2pPPb: Speech. Attention, and Impairment (Poster Session) 2pPPb10. The Glasgow Monitoring of Uninterrupted Speech Task (GMUST): A naturalistic measure of speech intelligibility in noise Alexandra MacPherson* and Michael A. Akeroyd *Corresponding author's address: MRC Institute of Hearing Research (Scottish Section), Queen Elizabeth Building, Glasgow, G31 2ER, Lanarkshire, United Kingdom, alex@ihr.gla.ac.uk When listening to speech in noisy environments parts of the speech signal are often missed due to masking, degradation, or inattention. Sometimes the message can be reconstructed, but reallocating resources to recover the missed information can affect the efficiency and speed at which the message is understood. A slowing of processing will be particularly detrimental if there are few opportunities for understanding to catch up, e.g. when listening to the radio. It is likely then that the monitoring of uninterrupted flows of continuous speech will be especially sensitive to hearing or listening impairments. We developed a new task, termed "The Glasgow Monitoring of Uninterrupted Speech Task" (GMUST), to test this. It requires participants to listen to a 10-minute segment of continuous speech whilst monitoring a scrolling transcript of the speech on the screen in front of them for any word changes. The proportion of changes participants are able to correctly identify is then used as a measure of speech intelligibility. Pilot data presented here suggests this new task is a suitable method for measuring the intelligibility of continuous speech. Future development of the task is also discussed. Published by the Acoustical Society of America through the American Institute of Physics 2013 Acoustical Society of America [DOI: 10.1121/1.4799865] Received 22 Jan 2013; published 2 Jun 2013 Proceedings of Meetings on Acoustics, Vol. 19, 050068 (2013) Page 1

INTRODUCTION When listening to speech in noisy environments much of the speech signal is often masked or degraded. In such situations at least part of the message can usually be reconstructed by perceptually filling in missed information and/or recalling information from memory. Such repair mechanisms, however, draw on cognitive resources and as a consequence speech understanding in noise is likely to be both more effortful and slower than it is in quiet. For older listeners, who particularly rely on top-down information to aid speech identification in noisy conditions (Pichora- Fuller, Schneider, & Daneman, 1995) and for whom a general slowing in processing is believed to occur (Salthouse, 1996), there is likely to be a further decrease in understanding efficiency. The speech-in-noise tests used in research and clinical practice usually use short utterances: monosyllabic words, digits or short sentences. These materials are usually presented one at a time with short pauses between each utterance for the listener to make a response. It has been suggested that such methods may not sufficiently tax listeners as pauses offer opportunities for processing to catch up (Shinn-Cunningham & Best, 2008). If this is the case then at least some of the difficulties a listener may be experiencing could be compensated for in these trial-bytrial type task. It could be reasoned that speech-in-noise difficulties would become most apparent if the listener was required to keep up with an ongoing flow of information. In continuous speech, for example, listeners must simultaneously retrieve and integrate any missed speech units whilst continuing to process incoming information (Pichora-Fuller, et al., 1995). Listeners who have to devote greater cognitive resources to reconstructing a degraded signal will have fewer resources available to allocate to the incoming portions of the message and speech intelligibly will begin to suffer. Continuous speech tasks are not only likely to be more sensitive to speech-in-noise difficulties than trial-by-trial tasks they are also likely to be a better representation of everyday speech. Much of the speech we encounter on a day-to-day basis is ongoing; when listening to the television or to the radio, for example, speech does not pause and wait for understanding to catch up. We argue that tasks designed to measure ongoing deficits in speech-in-noise understanding must challenge the listener in a similar way. Several methods for measuring the intelligibility of continuous, uninterrupted speech have been suggested previously. They include asking listeners to answer questions on previously heard segments of speech (Giolas & Epstein, 1963) or adjusting the level of target speech to reach a criterion level of understanding (Hawkins & Stevens, 1950). The speech reception thresholds (SRTs) measured by such methods are, however, likely to be confounded by post-perceptual processes such as memory or comprehension skill, or by subjective measures of intelligibility (Speaks, Parker, Kuhl, & Harris, 1972). A new task is proposed here which aims to assess the intelligibility of continuous uninterrupted speech in an on-line fashion while avoiding some of the problems experienced by previous measures of continuous speech. This task, the Glasgow Monitoring or Uninterrupted Speech Task (GMUST), requires a participant to listen to 10 minutes of continuous speech. Whilst listening, the participant is also asked to follow a transcript of the speech. A number of words within the transcript have been changed so that they no longer match those of the audio. The participant s task is to identify these audio/visual mismatches. The number of mismatches correctly identified provides a measure of speech intelligibility. Greater difficulties keeping up with speech will result in a decrease in the number of changes correctly identified. Similar audio/visual monitoring methods have been used previously to measure speech quality (Huckvale, Hilkhuysen & Frasi 2010) and reading skills (McMahon, 1983). The benefits of using such a method to measure the intelligibility of continuous speech are that the audio does not need to be paused in order to record listener responses and as it is an on-line assessment of intelligibility the influences of postperceptual processes are diminished. Another benefit of this task is that it has clear similarity to real-world tasks which many listeners will be familiar with, such as watching television or a film with subtitles. We report here a pilot experiment testing the suitability of this new task as a method for measuring the intelligibility of ongoing speech. METHOD Two segments of uninterrupted target speech and accompanying transcripts were created. The audio stimuli were ten-minute extracts from a commercially available audiobook. The audiobook was an unabridged version of Silver Blaze by Arthur Conan Doyle, read by a British-English male. An audiobook was chosen as it gave convenient access to large amounts of ongoing speech with which the initial validity of the task could be quickly tested. Proceedings of Meetings on Acoustics, Vol. 19, 050068 (2013) Page 2

The audio was presented in a speech-shaped, unmodulated noise. The masking noise was made by concatenating together separate noises of 20 second duration, with 1 second gaps. The audio continued during the pauses between masking noises. Transcripts were created for each of the 10-minute segments of audio with deliberate substitutions made to a subset of the words. To allow for a direct comparison of performance at different time intervals throughout the task, the audio was partitioned into 20 second windows and five word substitutions were made to the transcript in each window. The twenty-second windows were spaced one second apart and were thus aligned with the onset and offset of each masking noise. No substitutions were made within the one-second, noiseless intervals between maskers. The words selected for substitution were chosen in a pseudo-random fashion adhering to several rules: no function words were changed (e.g. the, his ), substituted words needed to be at least 4 words apart, and no word was substituted if it would alter the context of the text. Wherever possible, substituted words retained the same length and approximate word-frequency as the original word, as it was assumed that large changes in word lengths and/or frequency would make the substitution easier to spot. A Matlab program was created which presented the audio and displayed a scrolling copy of the transcript on a touch screen (see figure 1). The words currently being spoken appeared in the centre of the display along with the two previous and the two subsequent lines of the transcript. After each line of audio the transcript scrolled upwards. Listeners identified word substitutions by pressing the word on the screen. The background of the word then changed colour to indicate selection. The program recorded the number of correct identifications, misses and false alarms for each 20 second window. FIGURE 1. A screenshot from the GMUST. The line currently being spoken appears in the centre of the screen within the fixation lines. Once the audio reaches the end of a line the text scrolls upwards. Listeners press or click on words which they believe to be substitutions. The background of the selected words then change colour. The level of the target speech was fixed throughout the 10-minute block at 70 db. The level of the masking noise was adjusted adaptively during the block to create different target-to-masker ratios (TMRs). At the start of the block the masking noise was presented at a TMR of 0 db. A best-of-five adaptive procedure was used targeting 50% of substitutions (Bode & Carhart, 1974). At the offset of each masking noise (i.e. after each 20-second substitution window) the number of substitutions correctly identified during that window was calculated. If three or more of the designated substitutions had been correctly identified the TMR of the next presented noise masker was decreased, but if two or fewer of the substitutions had been correctly identified the TMR of the next presented noise masker was increased. The step size was initially 4 db but this was reduced to 2 db after three reversals. Each individual step was randomly varied by +/- 1 db to generate some variability in the levels visited by the adaptive track. At the end of the 10-minute block SRT was calculated by average the level at all reversals excepting the first. To give a reference measure of listeners ability to complete the monitoring aspect of the task listeners also carried out the task with no masking noise. Different audio stimuli were used in the quiet and noise conditions. Six listeners took part in this pilot study. They were aged between 23 and 42 (mean age = 29) and had normal hearing. Proceedings of Meetings on Acoustics, Vol. 19, 050068 (2013) Page 3

RESULTS Figure 2 displays the number of substitutions each participant was able to correctly identify in each 20-second substitution window during the GMUST when no background noise was presented. For listeners 1, 2 and 3 this baseline measurement was carried out using the first audio file (top panels) while for listeners 4, 5 and 6 the second audio file was used (bottom panels). Mean performance in this condition was high ( = 90%), though performance fluctuated somewhat throughout the ten-minute block. That not all substitutions could be identified indicates that the task was challenging even in quiet. FIGURE 2. The number of substitutions identified by each participant in each 20-second trial window of the GMUST carried out in quiet. Listeners 1, 2 and 3 listened to the first audio file, listeners 4, 5 and 6 listened to the second audio file. Several listeners showed an oscillating pattern, with a window to window variation in performance. Participant 3 showed a particularly clear example of this pattern. This could be related to the number of substitutions that listeners must identify in a short period of time. One way in which task difficulty could be reduced would be to reduce the number of substitutions per 20-second window from 5 to 3. This may give a more consistent level of performance across the 10 minute block. The data also gives an indication of the suitability of the continuous target stimuli and the selected word substitutions. Ideally the substations should be equally detectable across windows. A dip in performance in the 19 th substitution window for all three listeners who listened to the first audio file (listeners 1, 2 and 3) suggests that this may not have been the case. Figure 3 shows data collected in noise; the adaptive tracks measured for each listener and their calculated SRT (the dotted line). The SRTs were relatively similar across listeners (between -9.9 and -7.1 db). The convergence of performance on SRT, however, was better for some listeners than others. Participant 5 s performance, for example, deviated substantially from the calculated SRT (standard deviation of reversals = 3.9 db) while performance for participant 4 was more closely centred around the mean (standard deviation of reversals = 2.5 db). DISCUSSION We report here a pilot experiment testing the suitability of the Glasgow Monitoring of Uninterrupted Speech Task (GMUST) for measuring the intelligibility of continuous speech. It was found that the monitoring aspect of the task, while challenging, can be completed to a high level by young, normal hearing listeners. Some fluctuations in performance were seen across each 10 minute block in quiet and it is suggested that reducing the number of word Proceedings of Meetings on Acoustics, Vol. 19, 050068 (2013) Page 4

substitutions per window may reduce this variability. If the variations in performance due to general task difficulty can be reduced, changes in performance due to other factors, such as attention or listener effort, may become apparent. FIGURE 3. Adaptive tracks measured for six listeners on the GMUST in a speech modulated noise. The dotted lines represent the 50% speech reception threshold for each participant calculated from each adaptive track. The results of the pilot experiment also suggested that difficulty identifying substitutions was not equivalent across all 20-second windows. It is possible that several factors may affect how easy it is to spot a substitution. Changes where the substituted word is very different from the original word (e.g. strong to compelling) may, for example, be more likely to stand out than substitutions where only a few letters of the word have been changed (e.g. murmured to muttered). Conversely, if a substitution closely follows another it may be missed while the listener s attention is still engaged with responding to the previous substitution. These factors need to be considered when choosing which transcript words will be substituted in the full version of this task. An adaptive track procedure was also used to target each listeners SRT. The tracks for some listeners, however, showed large amounts of variation from the calculated SRT. While an ideal adaptive track would quickly converge on the chosen threshold and show little variation from this level once it has been reached, one might expect to see lapses in performance in an ongoing task such as this where listening effort is predicted to have a knock on effect on performance. Greater consideration may need to be given, therefore, to which reversals the SRT is based on in the full version of the task. The audio stimuli used in this pilot were taken directly from a commercially available audiobook and as such were not experimentally ideal. Future development of this task will require making suitable level-balanced recordings of the stimuli. This opportunity will also be taken to make adjustments to the content of the stimuli including removing any out-dated language, and to ensure that the complexity of the speech is relatively consistent across different 10-minute target stimuli. In summary, a new method for measuring continuous speech intelligibility has been reported, which we have demonstrated can be used adaptively to measure speech reception thresholds. While further development of the task is needed, particularly in developing appropriate stimuli, we hope it will be a useful tool for measuring the complex challenges faced when understanding speech in everyday listening environments. ACKNOWLEDGMENTS The first author is funded by a Medical Research Council centenary grant (grant number U135097131). The Scottish Section of the Institute of Hearing Research is all so funded by the Chief Scientist Office of the Scottish Government. Proceedings of Meetings on Acoustics, Vol. 19, 050068 (2013) Page 5

REFERENCES Bode, D. L., & Carhart, R. (1974). Stability and accuracy of adaptive tests of speech discrimination scores. Journal of the Acoustical Society of America, 56, 963-970. Giolas, T. G., & Epstein, A. (1963). Comparative intelligibility of word lists and continuous discourse. Journal of Speech and Hearing Research, 6(4), 349-358. Hawkins, J. E., & Stevens, S. S. (1950). The masking of pure tones and of speech by white noise. Journal of the Acoustical Society of America, 22(1), 6-13. Huckvale, M., Hilkhuysen, G., & Frasi, D. (2010). Performance-based measurements of speech quality with an audio proof-reading task. Paper presented at the 3 rd ISCA workshop on perceptual quality systems. Dresden, Germany. McMahon, M. L. (1983). Development of reading-while-listening skills in the primary grades. Reading Research Quarterly, 19(1), 38-52. Pichora-Fuller, M. K., Schneider, B. A., & Daneman, M. (1995). How young and old adults listen to and remember speech in noise. Journal of the Acoustical Society of America, 97(1), 593-608. Salthouse, T. A. (1996). The processing-speed theory of adult age differences in cognition. Psychological Review, 103(3), 403-428. Shinn-Cunningham, B. G., & Best, V. (2008). Selective attention in normal and impaired hearing. Trends in Amplification, 12(4), 283-299. Speaks, C., Parker, B., Kuhl, P., & Harris, C. (1972). Intelligibility of connected discourse. Journal of Speech and Hearing Research, 15(3), 590-602. Proceedings of Meetings on Acoustics, Vol. 19, 050068 (2013) Page 6