A Thesis. presented for the Degree of. Doctor of Philosophy. The School of Neurology, Neurobiology and Psychiatry, University of Newcastle-upon-Tyne

Size: px
Start display at page:

Download "A Thesis. presented for the Degree of. Doctor of Philosophy. The School of Neurology, Neurobiology and Psychiatry, University of Newcastle-upon-Tyne"

Transcription

1 MECHANISMS OF PATTERN ANALYSIS IN HUMAN AUDITORY CORTEX A Thesis presented for the Degree of Doctor of Philosophy in The School of Neurology, Neurobiology and Psychiatry, University of Newcastle-upon-Tyne by Jason Donald Warren January, 2005

2 Summary 1. Analysis of spectrotemporal patterns is fundamental to the perception of auditory scenes. Evidence in non-human species suggests that this analysis depends on mechanisms that are instantiated in structurally and functionally differentiated auditory cortical areas. However, the functional architecture of human auditory cortex has not been established. 2. The human functional magnetic resonance imaging experiments described in this thesis address four broad problems of auditory scene analysis: representation of spatial and non-spatial sound source information; disambiguation of sound movement from other dynamic complex spectrotemporal patterns; representation of sourcedependent and source-independent pitch information; and analysis of sound identity. 3. The findings demonstrate that distinct cortical mechanisms analyse information that is associated with sound sources and information that segregates sound sources. Pitch patterns are analysed in a cortical network extending antero-laterally to primary auditory cortex, while sound source location and motion are analysed in a distinct posterior cortical network. Pitch chroma (a source-independent property) is specifically represented anterior to primary auditory cortex, while pitch height (a source-dependent property that can be used to segregate sound sources) is specifically represented posterior to primary auditory cortex. 4. A specific human temporal lobe mechanism computes spectral shapes independently of fine spectrotemporal structure. The substrate for this mechanism is a cortical network that links posterior superior temporal areas engaged in sound source segregation with anterior and inferior temporal areas that analyse sound identity. 5. These findings support a hierarchical organisation of human auditory cortex in which mechanisms for processing source-dependent and source-independent information map onto distinct cortical networks that extend posteriorly and antero-laterally from primary auditory cortex. Within this scheme, the planum temporale acts as a functionally differentiated computational hub that disambiguates different types of spectrotemporal information. The evidence supports a functional homology of auditory cortical regions of the temporal lobe in humans and non-human primates. ii

3 Contents Summary... ii Abbreviations used in this thesis.... viii Tables... Figures. Acknowledgments... Declaration... ix x xi xii Chapter 1. GENERAL INTRODUCTION 1 Summary The nature of auditory cortical pattern analysis Auditory cortex in non-human species Studies in the macaque Anatomy Electrophysiology Lesion studies What and Where processing: a dichotomy? Homologies between species Human auditory cortex Anatomy Macroscopic anatomy Histology and histochemistry Neurophysiology Intracerebral recordings Non-invasive studies Lesion studies Auditory functional imaging General considerations Pitch and intensity Simple temporal patterns Sound sequences Speech and music Sound identity Auditory space Supramodal and cross-modal processing iii

4 1.4 Key problems in auditory cortical pattern analysis Experiment Experiment Experiment Experiment Chapter 2. TECHNIQUES AND METHODS Summary Imaging techniques General principles of MRI Constructing the MR image Echo-planar imaging Physiological basis of BOLD contrast Origin of the BOLD response The haemodynamic response function Image quality and sources of artefact Non-biological noise sources Biological noise sources Specific problems in auditory functional imaging Acoustic scanner noise Silent and sparse imaging protocols Brainstem motion Stimulus delivery Image pre-processing Realignment Spatial normalisation Spatial smoothing Statistical analysis The general linear model Gaussian random field theory Assumptions Issues in the design of fmri experiments Type of design Fixed- and random-effects analyses iv

5 Chapter 3. ANALYSIS OF SPATIAL AND NON-SPATIAL SPECTROTEMPORAL PATTERNS.. 69 Summary Background Experimental hypotheses Methods Stimuli Subjects fmri protocol Behavioural data Image analysis Results Group data Individual data Discussion Auditory What and Where mechanisms in the human brain The role of the planum temporale A synthesis?. 80 Chapter 4. ANALYSIS OF SOUND SOURCE MOTION Summary Background Experimental hypotheses Methods Stimuli Subjects fmri protocol Image analysis Results Discussion A human brain mechanism for processing sound source motion A common spatial analysis pathway? The planum temporale as a computational hub 92 Chapter 5. ANALYSIS OF SOURCE-DEPENDENT AND SOURCE-INDEPENDENT PITCH PROPERTIES.. 97 Summary Background Experimental hypotheses 100 v

6 5.3 Methods Bases for the manipulation of pitch dimensions Relationship between pitch height, tone height and timbre Psychophysical effect of pitch height manipulation Stimulus details Subjects fmri protocol Behavioural data on imaged subjects Image analysis Results Group data Individual data Discussion Human brain representations of pitch dimensions Pitch height and pitch chroma in auditory scene analysis Pitch height and pitch chroma as source-dependent and source-independent auditory properties Chapter 6. ANALYSIS OF SPECTRAL SHAPE Summary Background Experimental hypotheses Methods Rationale for the experimental design Stimulus details Subjects fmri protocol Behavioural data Image analysis Results Group data Individual data Discussion Human brain mechanisms for computing spectral envelope Hierarchical analysis of natural sound objects Spectral shape, timbre and sound object identity The formation of spectral templates A pathway for auditory object analysis? vi

7 Chapter 7. CONCLUSIONS Summary Human auditory cortex contains generic mechanisms for the analysis of patterned sound Distinct cortical networks analyse spatial and non-spatial spectrotemporal patterns Distinct cortical networks analyse source-dependent and source-independent pitch properties Hierarchical cortical mechanisms analyse auditory object properties The planum temporale is a computational hub for the analysis of spectrotemporal patterns Functional homologies can be established in the auditory cortex of humans and other primates Directions for further work References Appendices. 187 I Examples of Matlab scripts 188 II Division of labour for experimental work. 189 III Publications arising from this thesis. 191 IV Reprints of published work vii

8 Abbreviations used in this thesis AM BOLD CL CM db EPI ERP FM fmri FWHM HG HRF HRTF Hz IPL IRN MEG MMN MNI MRI NMR PAC PET PP PT PTO RF SPM STG STP STS T T1 T2 TE Tpt TR amplitude modulation blood-oxygen-level-dependent caudo-lateral belt field of macaque caudo-medial belt field of macaque decibels echo-planar imaging event-related potential frequency modulation functional magnetic resonance imaging full-width-at-half-maximum Heschl s gyrus haemodynamic response function head-related transfer function Herz inferior parietal lobe iterated rippled noise magnetoencephalography mismatch negativity Montreal Neurological Institute magnetic resonance imaging nuclear magnetic resonance primary auditory cortex positron emission tomography planum polare planum temporale parieto-temporal operculum radio frequency statistical parametric map superior temporal gyrus superior temporal plane superior temporal sulcus Tesla longitudinal relaxation time transverse relaxation time time-to-echo temporo-parietal area of macaque time-to-repeat viii

9 Tables Follows page Table 2.1 Table 3.1 Characteristics of stimulus delivery, fmri acquisition and image analysis for experiments in this thesis Representative functional imaging studies of What and Where processing in human auditory areas Table 3.2 Experiment 1: Local maxima of activation for group.. 76 Table 4.1 Experiment 2: Conditions and percepts Table 4.2 Experiment 2: Local maxima of activation for group Table 4.3 Human planum temporale in spectrotemporal processing. 93 Table 5.1 Experiment 3: Local maxima of activation for group Table 5.2. Experiment 3: Local maxima of activation for individuals Table 6.1 Experiment 4: Local maxima of activation for group Table 6.2 Experiment 4: Local maxima of activation for individuals ix

10 Figures Follows page Figure 1.1 Proposed cortical auditory areas in macaque and human... 5 Figure 1.2 Proposed auditory What and Where pathways in macaque Figure 2.1 Principles of sparse fmri data acquisition Figure 2.2 Diagram of stimulus delivery system for experiments Figure 2.3 Steps in pre-processing and analysis of fmri data Figure 3.1 Experiment 1: Schematic representation of paradigm Figure 3.2 Experiment 1: Statistical parametric maps for group Figure 4.1 Experiment 2: Statistical parametric maps for group Figure 4.2 Human planum temporale as a computational hub Figure 5.1 Experiment 3: Basis for stimulus manipulations Figure 5.2. Experiment 3: Psychometric functions for pitch height Figure 5.3 Experiment 3: Statistical parametric maps for group Figure 5.4 Experiment 3: Individual statistical parametric maps Figure 6.1 Experiment 4: Spectral shape of sounds Figure 6.2 Experiment 4: Schematic representations of stimuli Figure 6.3 Experiment 4: Statistical parametric maps for group x

11 Acknowledgments Functional brain imaging is always a cooperative enterprise and I am indebted to a number of people for making this work possible. Foremost among these, of course, are my experimental subjects; I am grateful to all of them. I thank Richard Frackowiak and Terry Morris for granting me access to the resources of the Functional Imaging Laboratory in the Wellcome Department of Imaging Neuroscience, Institute of Neurology, Queen Square. My special thanks go to the FIL radiographers (Amanda Brennan, Julie Somers, Janice Glensman, and Paul Bland), MRI physicists and engineers (in particular Oliver Josephs and Eric Featherstone), and technical and computing support staff (especially Peter Aston, Rachael Maddock and Ric Davis) for their professionalism and their resourcefulness. Much of the imaging work was inspired by psychophysical studies conducted by my colleagues at the Clinical Auditory Laboratory, Medical School, University of Newcastle-upon- Tyne; I thank in particular Amanda Jennings and Jessica Foxton for sharing their data and their ideas with me. I owe a similar debt to my collaborators at the Centre for the Neural Basis of Hearing, Department of Physiology, University of Cambridge, in particular, Roy Patterson and Stefan Uppenkamp. Gary Green and Adrian Rees of the Auditory Group, University of Newcastle gave generously of their time and provided helpful advice and support at each stage of this work. I am singularly indebted to my supervisor, Tim Griffiths, for guiding me into this field, for setting an example of lucid and creative scientific thought, and for his close and enthusiastic involvement throughout. To my wife Jane, for her patience and understanding, and to my family, who lent moral support despite the tyranny of distance: thanks, above all, for being there. xi

12 Declaration This thesis describes the original work of the candidate except where specifically acknowledged in the text. All participating subjects gave informed consent and all experimental work was carried out with the approval of the Ethics Committee of the Institute of Neurology, University College London, according to guidelines established by the Declaration of Helsinki. Jason Donald Warren Date xii

13 Chapter 1. GENERAL INTRODUCTION Summary The experiments described in this thesis address the mechanisms of pattern analysis in human auditory cortex. This chapter sets the work in its theoretical and experimental context. Our present limited understanding of auditory cortical function has emerged from complementary lines of investigation in humans and other species. Models of auditory cortical organisation derive largely from cytoarchitectonic and neurophysiological studies in non-human primates. While such studies have delineated the broad anatomical outlines of the cortical auditory system in different species, in most cases the functional roles of the putative cortical areas remain unresolved. An important element in this controversy is the extent of anatomical and functional homology between species. Anatomical and neurophysiological issues relevant to the design and interpretation of human auditory functional imaging experiments are discussed in this chapter. The functional architecture of the human auditory brain has proved relatively inaccessible to the traditional methods of clinical lesion and electrophysiological studies. With the advent of functional imaging techniques, it has recently become possible to correlate function with structure at the neuronal population level in the intact human brain. This is an appropriate level at which to test hypotheses concerning generic mechanisms of pattern analysis in human auditory cortex. The early development of this field and points of departure for the present work are outlined here. 1

14 1.1 The nature of auditory cortical pattern analysis The acoustic environment we inhabit is often chaotic, yet our brains routinely identify sound sources, track the information streams embodied in speech and music, and navigate auditory space. All these tasks depend upon the perception of patterns in sound. However, the brain mechanisms that sustain auditory pattern analysis have been comparatively little studied. These mechanisms represent an intermediate stage of auditory processing, interposed between detection of the raw acoustic signal at the cochlea, and the high level semantic and symbolic manipulations of language and music (Griffiths et al., 1999b). Although acoustic patterns can be identified in spectral, temporal or spatial domains, spectrotemporal and spatial properties of sound sources are generally conjoined in the world at large. This raises a number of questions. To what extent are these features processed separately by the auditory system? Do generic mechanisms of auditory pattern analysis exist? If so, what are their neuroanatomical substrates? While it is known that substantial processing of auditory information occurs in the ascending auditory pathways (Rauschecker, 1998), it is likely that more complex aspects of auditory scene analysis (for example, encoding of sound source motion: Ahissar et al., 1992) depend on additional processing in auditory cortex, at least in higher mammals. Mechanisms to support such additional processing are suggested by the extensive transformations of receptive field properties that occur between the auditory thalamus and auditory cortex (Miller et al., 2001). However, the precise role of mammalian auditory cortex in normal hearing is disputed (Talwar et al., 2001). One important role of the auditory cortex is likely to be modulation of processing according to the behavioural context (Malone et al., 2002; Middlebrooks, 2002). Another may be the detection of similarities between auditory stimuli (Whitfield, 1985). Moreover, the pattern analysis mechanisms that create auditory percepts are likely to reside in auditory cortex. This has been demonstrated both in clinical studies of human patients (Griffiths et al., 1999b) and in animal lesion studies (Heffner & Heffner, 1984, 1986a,b, 1989a,b,c, 1990a,b; Talwar et al., 2000; Recanzone, 2000a,b), in which damage to primary or higher-order auditory cortex produces deficits in the perceptual analysis, identification or localisation of sounds. This thesis is concerned with generic mechanisms of pattern analysis in human auditory cortex. Such mechanisms are involved in the analysis of auditory scenes 2

15 (Bregman, 1990) and the formation of auditory percepts, before those percepts are processed for meaning. Contemporary models of auditory cortex function and the interpretation of anatomical and functional data have been heavily influenced by visual neuroscience. It is now well accepted that the primate brain contains more than thirty visual cortical areas engaged in different aspects of visual scene analysis and organised into functional streams that analyse different kinds of visual information (Felleman & Van Essen, 1991). However, the computational problems facing the auditory system are quite different from those in the visual domain (Bregman, 1990). This follows from the essential nature of acoustic information, which arrives serially at the peripheral receptors and is ultimately unidimensional (since it is conveyed by the motion of the tympanic membranes) (Seifritz et al., 2002a; Zatorre, 2003). Sound is the only source of information about the large region of space behind the head, and may be crucial in the detection of salient environmental events (such as the approach of a predator or potential mate) under conditions of reduced visibility or over long distances. Furthermore, auditory spatial information (unlike visual space) cannot be represented as a simple one-to-one mapping at the level of the peripheral receptors, but must be computed from time-frequency cues (Wightman & Kistler, 1998). The evolutionary pressures on visual and auditory information processing will therefore have been quite distinct: this is reflected in a number of anatomical and physiological features of mammalian primary auditory cortex (PAC) that are not shared by visual and other primary sensory cortices (Linden & Schreiner, 2003). PAC occupies a relatively later stage in the information-processing hierarchy than does visual cortex, due to the greater anatomical complexity of the subcortical auditory pathways. One consequence of this complexity is that PAC receives binaural input from subcortical nuclei. In addition, PAC has extensive interhemispheric projections. Certain synaptic properties and connectivity patterns of PAC may constitute specific adaptations for the extraction of spectrotemporal features (Atzori et al., 2001; Miller et al., 2001; Zatorre et al., 2002a; Linden & Schreiner, 2003). It would be remarkable, therefore, if the cortical mechanisms that process auditory and visual information were not fundamentally different. Certain aspects of these mechanisms (such as their electrophysiology) can be studied in detail only in animal models. Furthermore, such models substantially influence the design and interpretation of 3

16 functional imaging studies of auditory pattern analysis by the human brain. Accordingly, the principles of auditory cortical organisation that have emerged from studies in nonhuman species (in particular, the macaque monkey) will first be considered in some detail. 1.2 Auditory cortex in non-human species Studies in the macaque Most neuroanatomical and neurophysiological information concerning the organisation of primate auditory cortex has been derived from studies in the macaque monkey (various species of the genus Macaca). Among the commonly available primates, the macaque is generally regarded as the best animal model for human auditory processing (Morel et al., 1993; Heffner & Heffner, 1990a). The anatomical organisation of the auditory brain is broadly similar in macaques and humans (Papez, 1929; Galaburda & Pandya, 1983; Galaburda & Sanides, 1980): in both species, auditory cortex occupies an extensive area of the superior temporal plane (STP) in the depth of the lateral fissure (a macroanatomical feature that is less well-developed in lower primates: Papez, 1929). Moreover, auditory information (as manifested in a repertoire of species-specific call sounds) plays an important role in the behaviour of wild macaques (Heffner & Heffner, 1986b; Jones et al., 1995; Ghazanfar & Hauser, 2001). Macaques have superior high-frequency and comparable low-frequency auditory acuity to humans (Jackson et al., 1999), show similar psychoacoustic and behavioural responses to phenomena such as auditory looming (Ghazanfar et al., 2002) and display comparable deficits after lesions of auditory cortex (Heffner & Heffner, 1986a,b) Anatomy. It was first recognised in the 19th Century (Ferrier, 1876) that dedicated auditory cortical areas in the macaque are located mainly in the STP and the caudal two-thirds of the superior temporal gyrus (STG). Within this region lies PAC, the projection zone of the ventral nucleus of the medial geniculate complex (the auditory thalamus). Early anatomical and electrophysiological surveys of macaque auditory cortex were undertaken by Brodmann (1909), Von Economo and Horn (1930), Walker (1937), Ades and Felder (1942), Walzl and Woolsey (1943) and Von Bonin and Bailey (1947). 4

17 However, detailed auditory cortical structure-function mapping was initiated by the landmark studies of Pandya et al. (1969) and Merzenich and Brugge (1973). Cytoarchitectonic and histochemical analyses (Mesulam & Mufson, 1985; Jones et al., 1995; Kaas & Hackett, 2000), axonal degeneration (Pandya et al., 1969; Jones & Powell, 1970) and immunotracer (Burton & Jones, 1976; Galaburda & Pandya, 1983; Morel et al., 1993; Jones et al., 1995; Romanski et al., 1999a,b) studies, microelectrode recordings (Rauschecker & Tian, 2000) and 2-deoxyglucose autoradiography (Poremba et al., 2003) have demonstrated that the principal auditory areas in the macaque can be subdivided into a central core region surrounded by belt and parabelt regions oriented along the long axis of the superior temporal lobe. In addition, macaque auditory cortex has widely distributed reciprocal extrinsic connections including rostral STG, insula, inferior parietal lobe (IPL) and lateral prefrontal cortices, lateral amygdaloid nucleus and subcortical structures including dorsal and medial divisions of the medial geniculate complex, putamen, inferior and superior colliculi. Core, belt, parabelt and extrinsic connections can be considered as levels in an auditory processing hierarchy (Kaas & Hackett, 2000), including both dedicated auditory and multimodal regions (Jones & Powell, 1970; Poremba et al., 2003). A number of schemes for the detailed classification of macaque auditory cortices have been proposed, however none is completely satisfactory in reconciling available anatomical and electrophysiological data. Representative schemes of auditory cortical organisation in both macaque and human are summarised in Figure 1.1 (overleaf). Auditory cortex shares with other sensory cortices certain basic cytoarchitectural characteristics, including a columnar organisation (Linden & Schreiner, 2003). Distinct cytoarchitectonic fields can be identified within the auditory core and belt regions, based on rostro-caudal increases in laminar differentiation (Galaburda & Pandya, 1983) and immunoreactivity (Hackett et al., 1998a), and the specific laminar origin and termination patterns of their inter-areal connections (Galaburda & Pandya, 1983; Pandya, 1995; Kaas & Hackett, 2000). Within the core, three fields can be identified: primary auditory cortex, PAC or AI; rostral R; and rostro-temporal RT. Area AI or PAC is characterised as koniocortex on account of intense granule cell proliferation in layer IV. This region also exhibits other features of primary sensory cortex, including the presence of a dense myelin band, intense staining for cytochrome oxidase, acetylcholinesterase and 5

18 22 Ia PII Id RL Ig T1 A1 T2 T3 Pa RI b a RL L A1 CM Merzenich & Brugge, 1973 A I A II Walzl & Woolsey, 1943 B22 F14 B13 Brodmann, 1909 Ferrier, 1876 Jpost Brodmann, 1909 Flechsig, 1920 IB TD TG TA TC TB Von Economo & Horn, 1930 PaAr ProA KAlt KAm PaAi PaAe PaAc/d Tpt Galaburda & Sanides, 1980 PIA MA A1 PA AA LA STA Burton & Jones, 1976 Pai PaAr Te3 ProA Ka(A1) PaAIt macaque Tpdg PP ins HG PT human Rivier & Clarke, 1997 Talavage et al., 2000 T1a T1b T2 Tpt PaAcReIt Galaburda & Pandya, 1983 STPg Idg AI Ig AII RI PA Mesulam & Mufson, 1985 RT AL RM R PL A1 CM C Morel et al., 1993 RTL RTM RT RM AL R MM ML AI CL CM MA AI LP PA AA ALA LA STA T3 Te3 A1 PLST TI1 Te1.2 Scheich et al., 1998 Te1.0 Te1.1 Te2 Howard et al., 2000 Morosan et al., 2001 Kaas & Hackett, 2000 Wallace et al., 2002b

19 Figure 1.1. Proposed cortical auditory areas in macaque and human Axial views of the superior temporal plane in the macaque brain (left) and the human brain (right), showing various subdivisions of auditory cortex (red outline) proposed on the basis of lesion (Ferrier, 1876); cytoarchitectonic (e.g., Brodmann, 1909; Morosan et al., 2001), myeloarchitectonic (Flechsig, 1920), immunohistochemical (e.g., Rivier & Clarke, 1997; Wallace et al., 2002b), connectivity (e.g., Burton & Jones, 1976; Morel et al., 1993), electrophysiological (e.g., Kaas & Hackett, 2000; Howard et al., 2000; Talavage et al., 2000) or functional imaging (Scheich et al., 1998) criteria. A common feature of organisational schemes in both species is a central primary-like or core auditory region, comprising the first primary auditory area (AI or A1; red solid) and up to two additional areas (red hatched). The position and extent of the core region vary considerably among schemes. In the human brain, A1 occupies (but is not coextensive with) the medial part of the first transverse gyrus of Heschl (HG), and the lateral part of HG is likely to contain at least one additional area. Surrounding the core is an uncertain number of non-primary auditory areas with distinct anatomical boundaries and functional characteristics. In the human brain, these non-primary areas extend into the planum temporale (PT), planum polare (PP), superior temporal gyrus and sulcus (not shown), and insula (ins). In the macaque, non-primary auditory cortex has generally been subdivided into belt and parabelt regions on the basis of distinct anatomical and electrophysiological properties. However, the higher-order auditory areas have not been characterised in detail and the extent of anatomical and functional homology between macaque and human brains has not been resolved. MACAQUE Brodmann, 1909: B22, area 22; Ferrier, 1876: F14, area 14 ; Walzl & Woolsey, 1943: A II, secondary auditory area; Merzenich & Brugge, 1973: RL, rostro-lateral field; CM, caudo-medial field; L, lateral field; a,b, fields a and b ; Burton & Jones, 1976: T1, T2, T3, first, second, and third temporal fields; Pa, post-auditory field; Pi, para-insular field; RI, retro-insular field; Ig, Id, Ia, granular, dysgranular, and agranular insular fields; Galaburda & Pandya, 1983: Ka, koniocortical area; ProA, pro-isocortical area; PaAc, caudal parakoniocortical area; PaAlt, lateral parakoniocortical area; PaAr, rostral parakoniocortical area; Tpt, temporo-parietal area; ReIt, retro-insular area; Mesulam & Mufson, 1985: AII, secondary auditory field; Ig, Idg, granular and dysgranular insular fields; RI, retro-insular field; STPg, superior temporal granular field, TPdg, temporo-polar dygranular field; PA, postauditory field; Morel et al., 1993: R, rostral field; C, caudal field; CM, caudo-medial field; PL, postero-lateral field; AL, antero-lateral field; RM, rostro-medial field; RT, rostro-temporal field; Kaas & Hackett, 2000: R, rostral area; RT, rostro-temporal area; RTM, medial rostro-temporal area; RM, rostro-medial area; MM, middle medial area; CM, caudo-medial area; CL, caudo-lateral area; ML, middle lateral area; AL, antero-lateral area, RTL, lateral rostro-temporal area HUMAN Brodmann, 1909: areas 22, 41, 42, 52; Jpost, posterior insular area; Flechsig, 1920: primäre Hörsphäre (red); Von Economo & Horn, 1930: areas TC, TA, TB, TD, TG; IB, posterior insular area; Galaburda & Sanides, 1980: KAm, medial koniocortex; KAlt, lateral koniocortex; PaAr, rostral parakoniocortex; PaAi, internal parakoniocortex; PaAe, external parakoniocortex; PaAc/d, caudal-dorsal parakoniocortex; ProA, prokoniocortex; Tpt, temporo-parietal cortex; Rivier & Clarke, 1997 / Talavage et al., 2000 (frequency-dependent response regions): AA, anterior area; LA, lateral area; MA, medial area; PA, posterior area; PIA, posterior insular area; STA, superior temporal area; Scheich et al., 1998: areas T1a/b, T2, T3; Howard et al., 2000: PLST, posterior lateral superior temporal auditory area; Morosan et al., 2001: areas Te1.0 / 1.1 / 1.2, Te2, Te3; TI1, para-insular area; Wallace et al., 2002b: ALA, antero-lateral area; LP, latero-posterior area; others as in Rivier & Clarke (1997).

20 parvalbumin, and dense reciprocal connections with the ventral medial geniculate complex of the thalamus (Morel et al., 1993; Jones et al., 1995; Pandya, 1995; Kaas & Hackett, 2000). Whether R and AI should in fact be regarded as distinct fields and whether RT should in fact be classified within the core are disputed by some authors (Rauschecker et al., 1997; Kaas & Hackett, 2000). Within the belt, at least seven fields can be distinguished, named according to their rostro-caudal position and their location (lateral or medial) relative to the core. The medial belt has primitive cytoarchitectonic features of prokoniocortex, including relative hypocellularity and prominence of deeper cell layers, whereas the lateral belt has features of parakoniocortex, including increased differentiation of layer III and de-emphasis of deeper layers (Morel et al., 1993). Individual fields within the belt are defined largely on the basis of their electrophysiology rather than cytoarchitectonic or histochemical properties (see below), however the medial belt fields are technically difficult to study and information concerning this region remains limited. The parabelt region can be differentiated from the adjacent lateral belt based on lighter staining for parvalbumin in cell layer IV, lower cell density and a stronger tendency to be arranged in vertical columns (Hackett et al., 1998a). Streams of auditory information flow have been inferred from the connectivity patterns of the auditory fields in the macaque (Galaburda & Pandya, 1983; Pandya & Yeterian, 1985; Kosaki et al., 1997; Hackett et al., 1998a; Kaas & Hackett, 2000). Each core field makes reciprocal connections with several surrounding belt fields in the ipsilateral hemisphere and (via the corpus callosum) with the homotopic core region of the opposite hemisphere (Hackett et al., 1998a; Kaas & Hackett, 2000). The density of connections between core and belt shows regional specificity and is higher between anatomically adjacent than between non-adjacent areas (Galaburda & Pandya, 1983; Morel et al., 1993; Kosaki et al., 1997; Hackett et al., 1998a; Kaas & Hackett, 2000). Each lateral belt field makes reciprocal connections both with the core and with the medial geniculate complex (chiefly the dorsal and medial divisions), with adjacent and more distant belt fields, and with the parabelt. Connections of the medial belt remain poorly defined (Kaas & Hackett, 2000), however its projections appear to be more diffuse than those of the lateral belt and include auditory parabelt (Hackett et al., 1998a) and IPL (Disbrow et al., 2003). The boundaries and divisions of the parabelt are at present even less secure than for other auditory cortical regions, however rostral and caudal subdivisions lying between the rostral and caudal poles of the belt have been proposed based on distinct patterns of 6

21 connections with the belt (Hackett et al., 1998a; Kaas & Hackett, 2000). The parabelt also connects with a number of thalamic nuclei, chiefly the dorsal and magnocellular divisions of the medial geniculate complex (Hackett et al., 1998b) and medial pulvinar (Galaburda & Pandya, 1983; Kaas & Hackett, 2000): the latter provides a potential substrate for multimodal integrative and attentional processing (Yeterian & Pandya, 1989). However, parabelt connections with the core and with the ventral medial geniculate are minimal (Hackett et al., 1998a,b), suggesting that auditory information flow to the parabelt and higher-order cortices is regulated by the belt (Hackett et al., 1998a). In addition to their subcortical and intrinsic inter-areal connections, belt and parabelt fields have distinct extrinsic cortico-cortical connections (Jones & Powell, 1970; Pandya & Yeterian, 1985; Pandya, 1995; Kaas & Hackett, 2000). The medial belt is likely to have interconnections with multimodal cortex in the insula (Pandya, 1995) as well as the IPL (Disbrow et al., 2003). The parabelt makes reciprocal rostro-caudal connections with adjoining cortical regions (Jones & Powell, 1970; Hikosaka et al., 1988; Pandya, 1995; Hackett et al., 1998a; Kaas & Hackett, 2000; Yukie, 2002; Poremba et al., 2003): caudally, the adjacent posterior STP (area Tpt), caudal STG, caudal cingulate and retrosplenium; and rostrally, upper and lower banks of superior temporal sulcus (STS), rostral STG, perirhinal and entorhinal cortices and amygdala. Overlapping projections arising along the rostro-caudal extent of the parabelt pass to medial belt and STS (Hackett et al., 1998a). Interhemispheric callosal connections are made between homotopic parabelt areas (Hackett et al., 1999a). In addition, both lateral belt and parabelt have regionally specific reciprocal connections with four major frontal lobe regions (Jones & Powell, 1970; Petrides & Pandya, 1988; Pandya, 1995; Hackett et al., 1999b; Romanski et al., 1999a,b; Kaas & Hackett, 2000): caudal belt and parabelt principally connect with the frontal eye fields (area 8a) and dorso-lateral prefrontal cortex (dorsal area 46), while rostral belt and parabelt principally connect with ventral prefrontal cortex (rostral area 46 and area 12), frontal pole (area 10) and orbitofrontal cortex (area 13). The middle lateral belt connects with each of these frontal areas, and both rostral and caudal belt and parabelt connect with areas 46, 45 and 12 (Romanski et al., 1999a,b). The cytoarchitectonic features of an auditory field and its respective frontal projection areas tend to be similar (Pandya, 1995). There is some evidence for an auditory information relay in IPL, with regionally specific superior temporal and frontal connections (Cavada 7

22 & Goldman-Rakic, 1989; Lewis & Van Essen, 2000; Disbrow et al., 2003; Poremba et al., 2003). Extensive overlap exists between auditory, visual and somatosensory areas in STS, frontal and parietal lobes (Jones & Powell, 1970; Cavada & Goldman-Rakic, 1989; Seltzer & Pandya, 1989; Poremba et al., 2003). The projection zones of the auditory parabelt can be considered an additional tier in the macaque auditory processing hierarchy (Pandya, 1995; Kaas et al., 1999; Kaas & Hackett, 2000), establishing anatomical substrates for multimodal integration of auditory information with polysensory (STS, parietal and frontal lobes), limbic and evaluative (insula, rostral STG and orbitofrontal cortex), motor, attentional and working memory (lateral frontal) processes. Hemispheric specialisation may emerge at higher levels of the hierarchy (Poremba et al., 2004): metabolic activity evoked by macaque call sounds is significantly greater in the left than the right temporal pole on 2-fluoro-2-deoxyglucose positron emission tomography (FDG-PET). Such higher-order selectivity for particular classes of complex sounds might constitute a precursor for the hemispheric differentiation observed in humans for the processing of speech and music Electrophysiology. The elaborate anatomical organisation of macaque auditory cortex suggests a capacity for hierarchical encoding and integration of auditory information both serially (from core to belt to parabelt, and beyond) and in parallel (from thalamic nuclei to multiple auditory fields, and from belt and parabelt to multiple extrinsic cortical targets). Electrophysiological studies based on single- or multi-unit microelectrode recordings provide direct evidence for such information processing: such studies are critical for determining the chronology of auditory activation, and for elucidating the functional connections between areas that constitute processing streams. One potential limitation of most conventional microelectrode studies is their bias for recording from middle cell layers within a cortical column, thereby emphasising thalamocortical inputs rather than capturing response properties that may vary as a function of laminar depth (Linden & Schreiner, 2003). A similar limitation may also apply to functional magnetic resonance imaging (fmri) studies that measure the bloodoxygen-level-dependent (BOLD) signal (see Chapter 2, Section 2.1.4, p. 48). In addition, the use of general anaesthesia imposes clear caveats in primate electrophysiological studies (see for example, Wang, 2000); this limitation has been addressed in an increasing number of recent studies using awake, behaving animals. 8

23 In anaesthetised monkeys, multiple tonotopic representations of the cochlea have been defined in each of the core and lateral belt fields (Merzenich & Brugge, 1973; Morel et al., 1993; Rauschecker et al., 1995, 1997; Kosaki et al., 1997) and less clearly in caudomedial belt (CM) (Kosaki et al., 1997; Kaas & Hackett, 2000). In AI, low-to-high frequencies are represented by a rostro-caudal progression of maximally sensitive neurons. The proportion of neurons responding maximally to low and high frequencies varies between cortical fields (Rauschecker et al., 1997). Tonotopic maps appear to reverse orientations across inter-areal boundaries, such that boundaries of adjoining areas are tonotopically congruent (Morel et al., 1993). Neurons in each of the core fields show short-latency responses to pure tones with specific best or characteristic frequencies and narrow frequency-response curves, while neurons in lateral belt show stronger responses to narrow-band noise centred at a best frequency than to pure tones (Merzenich & Brugge, 1973; Morel et al., 1993; Rauschecker et al., 1995; Kosaki et al., 1997; Kaas & Hackett, 2000). Neurons in rostral parabelt generally respond to white noise but not pure tones (Kosaki et al., 1997), while neurons in caudal parabelt respond to noise and pure tones over a wide frequency range (Hikosaka et al., 1988; Kaas et al., 1999). These responses properties are consistent with an integration of convergent inputs from core to belt to parabelt neurons (Kaas et al., 1999), however broader tuning properties could also reflect a changing composition of thalamic inputs to these cortical fields (Rauschecker et al., 1997). If (as seems likely) the tuning properties of different thalamic nuclei in macaque are similar to those in other species, these properties would be preserved in the tuning characteristics of the core, belt and parabelt regions with which the thalamic nuclei principally connect (Kosaki et al., 1997). The ventral nucleus, projecting mainly to the core, conveys narrow-band frequency-specific information, while dorsal and other nuclei, projecting mainly to the surrounding cortical fields, convey broadband information. Both thalamocortical and interhemispheric callosal connections appear to be largely directed to tonotopically matched neurons (Morel et al., 1993). Evidence that cortical areas are activated by projections from adjacent cortical fields (Rauschecker et al., 1997; Recanzone et al., 2000a) supports the concept of serial information transfer between auditory areas. This is not exclusively the case, however: for example, while lesions of AI largely abolish responses to pure tones in CM, many CM neurons continue to respond to broadband sounds (Rauschecker et al., 1997), suggesting that thalamocortical inputs conveying some types of acoustic information are processed in parallel. 9

24 In behaving monkeys, single-unit recordings from auditory core and belt (Leinonen et al., 1980; Recanzone, 2000a,b; Recanzone et al., 2000a,b) show response properties that are generally consistent with those observed in anaesthetised animals: neurons in AI are generally found to have the shortest response latencies, while neurons in area CM have the broadest frequency tuning and highest pure tone thresholds. Spatial maps of characteristic frequency are evident in core fields, though not in belt. A number of other response parameters, such as latency, threshold and stimulus intensity tuning, vary widely between auditory areas and do not appear to be spatially organised within areas (Recanzone et al., 2000a). This may in part reflect the considerable variability of neuronal responses in alert animals, which itself varies between auditory fields for different response properties but is lowest for all response parameters in AI. These observations are consistent with an integrative function of belt areas, in the preferential processing of particular types of information from different subpopulations of AI neurons (Recanzone et al., 2000a). Electrophysiological characteristics of particular auditory fields may support different functional roles: for example, the broad frequency tuning of CM neurons might support the processing of spectral cues in sound localisation (Rauschecker et al., 1997; Recanzone et al., 2000b; Tian et al., 2001), while the relatively intensity-insensitive bandwidth selectivity of lateral belt neurons might enable complex acoustic features to be identified across a range of intensities (Rauschecker et al., 1995; Rauschecker, 1998; Recanzone, 2000a; Recanzone et al., 2000a). While the use of pure tone stimuli can establish certain elementary properties of auditory cortical organisation (such as tonotopy), such stimuli are non-ecological. A much richer picture of auditory cortex function emerges in studies using complex stimuli that simulate natural broadband sounds such as species-specific vocalisations and which can be localised in external space (Rauschecker & Tian, 2000). At the level of PAC, these include excitatory and inhibitory responses to stimulus boundaries in spectral and temporal domains, sensitivity to spectral motion and feature conjunctions (De Charms et al., 1998; Kaas et al., 1999), and differential sensitivities to spectral and periodicity information in click trains (Steinschneider et al., 1998). However, it is clear that many aspects of auditory behaviour cannot be predicted from the response properties of single neurons or from single parameters of neuronal population responses at the level of AI (for example, simple phase-locking of neuronal responses cannot account for the range of perceived pitch: Kaas et al., 1999). In awake animals, an additional dimension of 10

25 response complexity has been revealed at the level of temporal discharge patterns, including phasic, tonic and complex responses (Recanzone, 2000a; Recanzone et al., 2000a; Malone et al., 2002; Semple & Scott, 2003): such patterns can encode dynamic and contextual sound properties that would not be captured using a simple rate code. Increasingly selective responses to complex stimuli such as species-specific vocalisations are observed on passing from core to belt (Rauschecker et al., 1995; Rauschecker & Tian, 2000; Tian et al., 2001). Such responses may depend on specific frequency modulation (FM) parameters that match the FM characteristics in natural call sounds (Tian & Rauschecker, 1995), and include nonlinear summation of call frequency components (Rauschecker et al., 1995). Call-detectors that integrate specific stimulus parameters such as frequency bands or time delays might in principle be instantiated at the level of single neurons (Rauschecker & Tian, 2000) or neuronal populations (Wang, 2000). However, the response properties of core and belt neurons form a continuum, rather than discrete classes as are found in the subcortical auditory system (Recanzone, 2000a). At the level of the belt, neurons rarely show responses specific to a single call sound. This may indicate simply an interim stage of information processing, or alternatively, population-level coding of such complex auditory patterns (Tian et al., 2001). At the level of the parabelt, responses may be selective for particular complex sounds (such as human consonantal sounds), for multimodal stimuli that conjoin spatial and temporal information, or for information derived from different sensory modalities (Leinonen et al., 1980; Hackett et al., 1998a). Polymodal responses are frequently recorded in STS, suggesting that this area integrates auditory object information with object information obtained in other sensory modalities (Baylis et al., 1987; Barnes and Pandya, 1992). The activity of auditory neurons in the behaving macaque is modulated by level of arousal, attentional set and task requirements (Leinonen et al., 1980; Benson et al., 1981; Hikosaka et al., 1988; Kaas et al., 1999). Macaque ventro-lateral prefrontal cortex contains a domain responsive to complex sounds, particularly vocalisations (Romanski & Goldman-Rakic, 2002). This finding is consistent with the anatomical role of this region as a terminus for auditory projections from the superior temporal lobe, and suggests a possible physiological substrate for mnemonic and semantic processing of complex sounds, and for reciprocal modulation of auditory cortex activity by behavioural influences. 11

26 The encoding of auditory space is problematic. Auditory core, belt and parabelt contain neurons that show spatial tuning, generally for contralateral or bilateral sound sources (Leinonen et al., 1980; Benson et al., 1981; Ahissar et al., 1992; Rauschecker et al., 1997; Recanzone, 2000b; Recanzone, et al. 2000b; Tian et al., 2001). Spatial selectivity is greater in caudal belt than in PAC (Recanzone, 2000b; Recanzone et al., 2000b) and rostral belt fields (Tian et al., 2001), however spatial tuning is generally broad rather than selective for particular locations, and spatial responses are not topographically organised within cortical fields (Leinonen et al., 1980; Ahissar et al., 1992; Recanzone et al., 2000b). Spatial encoding generally involves the integration of binaural cues (intensity and phase differences) which can be represented by single neurons in AI (Brugge & Merzenich, 1973). However, high spatial selectivity is likely to depend on ensemble responses (Ahissar et al., 1992; Middlebrooks et al., 1994; Kaas et al., 1999): these appear to be serially integrated in higher-order caudal belt and parabelt areas including CM (Recanzone et al., 2000a,b) and TpT (Leinonen et al., 1980). Neurons in Tpt (temporo-parietal cortex) and intraparietal sulcus frequently show responses to stimuli in different sensory modalities (Leinonen et al., 1980; Mazzoni et al., 1996), suggesting that this region might play a role in the multimodal integration of spatial information. Neurons that encode auditory spatial direction and distance from the head have been identified in macaque dorsal and ventral premotor cortex (Azuma & Suzuki, 1984; Graziano et al., 1999). In the multimodal ventral premotor region, the majority of such neurons also respond to visual or tactile stimuli, suggesting that they subserve an integrated multisensory representation of spatial location as might be used in planning and guiding movements. The processing of sound movement appears to depend (at least at the level of AI) on neuronal mechanisms similar to those that encode static location (Ahissar et al., 1992) Lesion studies. Studies of the effects of cortical ablations complement anatomical and electrophysiological techniques by demonstrating the role of the cortex in macaque auditory behaviour. The majority of lesion studies have used large ablations of STG in one or both hemispheres, including both PAC and variable amounts of rostral non-primary cortex. Bilateral lesions that include PAC produce severe deficits in sound detection with the features of a sensory rather than a non-specific attentional impairment 12

27 (Heffner & Heffner, 1986a, 1990a). Unilateral lesions produce a much smaller contralateral hearing loss, consistent with bilateral cortical involvement in sound detection (Heffner & Heffner, 1989a). Partial recovery of detection thresholds occurs over a number of weeks, however it is likely that this is not complete, especially in the frequency range 4 to 32 khz (Heffner & Heffner, 1990a). Detection deficits are observed for both pure tones and broadband noise. Extensive lesions of STG disrupt detection of intensity decrements in the contralateral ear (Harrington & Heffner, 2004). Auditory cortex lesions also disrupt sound localisation: unilateral lesions impair sound localisation in the contralateral hemifield, whereas bilateral lesions impair sound lateralisation and abolish localisation within each hemifield (Heffner & Heffner, 1990a,b). Monkeys have difficulty in learning to approach a sound source, consistent with a defect in the perception of source location. Performance is not clearly related to the extent of the lesion within rostral STG. It appears that information about sound source location is processed chiefly by the contralateral hemisphere, while information about sound source identity is processed by either or both hemispheres (Heffner & Heffner, 1990a). Lesions that produce auditory behavioural deficits are not confined to auditory cortex: impairments of noise/tone discrimination, sound localisation (Vaadia et al., 1986) and the formation of auditory-visual associations (Gaffan & Harrison, 1991) may also occur after lateral frontal lesions. Lesion studies have demonstrated deficits affecting different types of auditory pattern processing. Extensive bilateral lesions of STG diminish (but do not abolish) detection of amplitude modulation of pure tones (Harrington & Heffner, 2004). The ability of macaques to discriminate coo vocalisations is transiently impaired by ablation of left but not right auditory cortex, consistent with emerging anatomical evidence for asymmetric hemispheric processing of species-specific call sounds (Poremba et al., 2004). Bilateral ablations permanently abolish the ability to discriminate call sounds (Heffner & Heffner, 1984, 1986b). This aphasia-like deficit cannot be attributed simply to hearing loss, nor is it restricted to ethologically relevant complex sounds (Heffner & Heffner, 1990a); rather, it appears to be due to an inability to detect frequency sweeps (Heffner & Heffner, 1989b; Harrington et al., 2001). The deficit appears to be produced by lesions of PAC and/or rostral non-primary auditory cortex, however it is not observed following lesions caudal to PAC (Heffner & Heffner, 1989c; Harrington & Heffner, 2002). Impairment of auditory temporal sequence discrimination may follow resection of rostral non-primary 13

28 auditory cortex in the left hemisphere (Cowey & Dewson, 1972), and has the features of an auditory working memory deficit. However, precise structure-function relationships cannot be established from the available lesion evidence: current models of the organisation of auditory processing in the macaque are therefore predicated largely on electrophysiological and anatomical data What and Where processing: a dichotomy? Sound identification and localisation are the essential tasks of auditory scene analysis (Bregman, 1990). The concept of distinct processing pathways for these different types of auditory information was originally motivated by analogies with the visual and somatosensory systems (Rauschecker, 1998), and this formulation has been both highly influential and controversial (Cohen & Wessinger, 1999; Kaas & Hackett, 1999; Belin & Zatorre, 2000, Romanski et al., 2000; Stecker et al., 2003; Hall, 2003). In electrophysiological studies, macaque lateral and rostral belt and parabelt fields show relative selectivity for auditory patterns that constitute sound objects (such as monkey call sounds), while caudal belt and parabelt are relatively selective for auditory spatial information (Rauschecker et al., 1997; Rauschecker & Tian, 2000; Recanzone et al., 2000a,b; Tian et al., 2001). Metabolic selectivity for ethologically relevant complex sounds (macaque vocalisations) is preserved at the level of the macaque dorsal temporal pole (Poremba et al., 2004), which might act as a relay for a frontally directed pathway processing auditory object information. These relatively selective stimulus processing mechanisms have been conceptualised as dual ventral What (object-related) and dorsal Where (space-related) auditory streams, with anatomical substrates in the hierarchically organised parallel frontal lobe projections arising in the macaque STP (Rauschecker et al., 1997; Rauschecker, 1998; Romanski et al., 1999a,b; Kaas & Hackett, 1999; Rauschecker & Tian, 2000; Lewis & Van Essen, 2000; Poremba et al., 2003). According to this scheme, the ventral auditory pathway originates in rostral belt and parabelt and projects to rostral STG and frontal areas 10, 12, 45 and 46; while the dorsal auditory pathway originates in caudal belt and parabelt and projects to IPL and frontal areas 8a, 12, 45 and 46. The scheme is represented diagrammatically in Figure 1.2 (overleaf). Duality of processing appears to be preserved in the distinct projection targets of these pathways in the frontal pole and ventral prefrontal cortex (object information: Romanski & Goldman-Rakic, 14

29

30 Figure 1.2. Proposed auditory What and Where pathways in macaque [after Rauschecker, 1998; Kaas et al., 1999; Romanski et al., 1999a,b; Rauschecker & Tian, 2000; Kaas & Hackett, 2000] In the macaque, it has been proposed that ventral and dorsal cortical streams process auditory object ( What ) and spatial ( Where ) information, respectively. The putative processing streams originate in the superior temporal plane and project to distinct ventral and dorsal frontal areas. Potential for extensive interaction between the two streams exists both in the superior temporal plane and in overlapping prefrontal projection targets. The number of auditory areas, inter-areal boundaries and origins of the putative processing streams have not been defined precisely. Nomenclature of core, belt and parabelt areas is based on Kaas and Hackett (2000); see Figure 1.1. The status of temporo-parietal (TP), posterior parietal and insular (INS) cortical areas within the dual pathway model has not been investigated systematically. Arrows indicate the proposed flow of incoming auditory information from primary auditory cortex; however, linkages between auditory areas and fronto-temporal connections are reciprocal, allowing for the bi-directional exchange of auditory information within each of the putative processing pathways.

31 2002) and more dorsally located lateral frontal areas (spatial processing: Vaadia et al., 1986; Romanski et al., 1999a; Azuma & Suzuki, 1984). Dichotomies of this kind must be qualified, both with respect to the number of processing streams and the functional basis for any separation of processing. The perception of natural sounds generally involves the binding of information about object features and spatial location (Recanzone et al., 2000b; Semple & Scott, 2003). Indeed, considerable cross-talk is evident in the processing of spatial and non-spatial properties of broadband sounds, at the levels of anatomy, electrophysiology and behaviour. Anatomically, there are multiple potential sites of interaction between auditory areas in the superior temporal lobe, and in addition, the frontal projections of the middle belt and parabelt areas overlap with those of more caudal and rostral areas (Petrides & Pandya, 1988; Romanski et al., 1999b). These projections may play an integrative role between the ventral and dorsal streams or might themselves constitute an additional functional stream (Kaas & Hackett, 1999). Electrophysiologically, a proportion of neurons in the spatially sensitive caudolateral belt field CL show covarying selectivity for monkey call sounds (Rauschecker & Tian, 2000; Tian et al., 2001), while spatially sensitive frontal lobe neurons are activated by broadband sounds but not by pure tones (Graziano et al., 1999). Behaviourally, the accuracy of sound localisation by macaques improves with increasing stimulus bandwidth (analogous to the pattern shown by human listeners) (Brown et al., 1980). Similar caveats apply to the processing of non-spatial auditory information. In behaving monkeys, neurons in ventro-lateral prefrontal cortex (a projection target of the putative What stream) are sensitive to the spatial attributes of broadband stimuli and their selectivity for non-spatial attributes does not show a simple increase compared with antero-lateral auditory belt (Cohen et al., 2004). Such observations call into question any simple anatomical or functional dichotomy based on separate processing streams for spatial and non-spatial auditory information. A more fundamental objection to the What-Where dichotomy concerns the true nature of the auditory information processed in each stream. It has been suggested that the most appropriate auditory analogue of visual spatial motion is spectral motion (evolution in the frequency domain), processed in a putative auditory caudal How pathway (Belin & Zatorre, 2000; Linden & Schreiner, 2003). Alternatively, separation of processing in the dual auditory pathways might be based on within-object and between-object 15

32 properties (Cusack et al., 2001), or on the type of behavioural response mediated by each pathway (Middlebrooks, 2002; Hall, 2003). Sites of convergence between auditory areas and visual motion areas in the superior temporal sulcus have been proposed as the anatomical substrate for a third (or an alternative) pathway for the processing of auditory motion (Poremba et al., 2003). The multiple parallel pathways to the frontal lobe arising from different levels of the auditory processing hierarchy might also support different levels of representation of stimulus properties (Romanski et al., 1999a,b) Homologies between species Structure-function relationships in auditory cortex have been explored using anatomical, electrophysiological and behavioural techniques in a number of mammalian species. The cat has been the most intensively studied non-primate species (Read et al., 2002; Stecker et al., 2003), however data is now available for numerous others, including marsupials (Heffner & Heffner, 1990a), bats (Suga et al., 1978; Rauschecker, 1998), guinea pigs and other rodents (Read et al., 2002; Wallace et al., 2002a), ferrets, and dogs (Heffner & Heffner, 1990a). Primate species studied include prosimian galagos, New World monkeys including marmosets, tamarins, squirrel monkeys and owl monkeys, and Old World primates including baboons and chimpanzees (Rauschecker et al., 1997; Rauschecker, 1998; Wang, 2000; Kaas & Hackett, 2000; Hackett et al., 2001). These species vary widely in the complexity of their auditory behaviour, and in their validity as animal models of human auditory cortex organisation. For example, the echo-locating capabilities of bats and the mobile pinnae of carnivores such as the cat are likely to be associated with specialised neural computational mechanisms. The behavioural consequences of auditory cortex ablations correlate broadly with auditory discrimination abilities in different species: the effects of lesions are transient in rats (Talwar et al., 2001) but significant in primates (Heffner & Heffner, 1986a, 1990a). While certain aspects of auditory behaviour (such as a left hemisphere advantage for call sounds and speech-like stimuli) appear to be common to different mammalian species (Zatorre et al., 2002a), common processing mechanisms cannot be inferred from such observations. Primates offer the advantages of a shared evolutionary lineage and (in many cases) sensitivity to a large repertoire of con-specific vocalisations, and indeed, anatomical homologies are most secure between primate species, such as macaque, chimpanzee and 16

33 human (Hackett et al., 2001). However, the extent of anatomical and functional homologies across species is currently the subject of considerable controversy. Certain anatomical and functional principles of auditory cortical organisation may be generally valid among mammals (Rauschecker, 1998; Read et al., 2002; Semple & Scott, 2003). The broad anatomical framework in placental mammals appears to consist of a central tonotopic core region surrounded by less clearly tonotopic non-primary cortex (Talwar et al., 2001; Semple and Scott, 2003), however the number and arrangement of non-primary cortical fields vary widely between species (Hackett et al., 2001). The transformation of neuronal response properties between primary and non-primary regions with increasing selectivity for complex stimulus properties has been observed in a range of mammalian species (Semple & Scott, 2003). Evidence emerging from cytoarchitectonic and histochemical and studies suggests that subclassification of nonprimary areas into belt and parabelt regions may be feasible in many species (Read et al., 2002; Wallace et al., 2002a). However, tonotopic maps in non-primate species such as the cat have reversed orientation to those in primates, suggesting that rostro-caudal rotation has occurred during primate evolution, probably as a result of disproportionate enlargement of the anterior temporal lobes (Rauschecker et al., 1997). This creates difficulties for schemes that attempt to map functional homologies between primates and other species. At the level of behaviourally relevant electrophysiology, few general principles have been identified even for PAC. Such general organisational principles may include the enlarged representation of ethologically salient frequencies in tonotopic maps (Read et al., 2002), orthogonal representations of best-frequency and bandwidth tuning (Rauschecker, 1998), the representation of elementary spectrotemporal features (Shamma, 1999) and temporal edges (Nelken, 2002), primitive mechanisms for auditory scene analysis including comodulation masking release (Nelken et al., 1999) and a capacity for plasticity driven by auditory experience (Talwar et al., 2001; Read et al., 2002). However, there are numerous inter-species differences even in the representation of elementary acoustic phenomena such as FM sweeps (Nelken, 2002) and in the spatial mapping of simple neuronal response properties other than characteristic frequency (Recanzone et al., 2000a). Basic differences are also apparent at higher levels of information processing and transfer between cortical areas (Kitzes & Hollrigel, 1996). 17

34 Nevertheless, there is emerging evidence to suggest that some very general attributes (such as regional selectivity for spatial and non-spatial sound properties) may hold across species (Stecker et al., 2003). The demonstration of functional correspondences using ethologically relevant complex sounds will be critical in establishing homologies between humans, non-human primates and other species. One possible strategy might be combined electrophysiological and functional brain imaging studies in animals (Logothetis et al., 2001). Such studies would constitute an important validation of the results emerging from auditory functional imaging experiments in humans. 1.3 Human auditory cortex Anatomy Macroscopic anatomy. The human superior temporal lobe has been known to play a role in hearing for over a century (Galaburda & Sanides, 1980), and areas corresponding loosely to primary and secondary auditory cortices were indicated on early cytoarchitectonic and myeloarchitectonic maps such as those of Campbell (1905), Brodmann (1909), Flechsig (1920), Beck (1928) and Von Economo and Horn (1930). A general consensus has emerged that human auditory cortex shares a common organisational scheme with the macaque and other primates, consisting of a central primary area surrounded by multiple non-primary fields (Hackett et al., 2001). However (as in the macaque), the details of this organisational scheme have not been resolved, and a number of alternatives have been proposed (Rivier & Clarke, 1997). Differences in methodology and nomenclature (see Figure 1.1) have been compounded by the obvious technical limitations of studying the human auditory brain in vivo and in processing post mortem tissue, and the relative rarity of clinical lesions that selectively involve the superior temporal lobe. Structural MRI complements pathological macro- and microanatomical techniques by enabling the complex macroscopic relationships of the human superior temporal lobe to be studied in detail in vivo. While it has long been recognised that the medial portion of the transverse gyrus of Heschl (HG) in the STP contains human PAC, recent detailed mapping studies have demonstrated that the correspondence between cytoarchitecture and macroscopic landmarks in individual brains (as assessed using MRI) is imprecise (Hackett et al., 18

35 2001), and considerable variability exists between individuals (Morosan et al., 2001; Rademacher et al., 2001). Indeed, the proportion of HG occupied by PAC in different individuals ranges from 16 to 90% (Rademacher et al., 2001). The gyral pattern of the human superior temporal lobe is highly variable, and a number of gross anatomical configurations of human HG have been described (Leonard et al., 1998; Hackett et al., 2001; Rademacher et al., 2001). The localisation of human PAC with respect to macroscopic landmarks is most reliable when a single HG is present, in which case PAC is largely confined by sulcal boundaries (Hackett et al., 2001; Rademacher et al., 2001). In cases where HG is bifid, cytoarchitectonically defined PAC may be confined to the more anterior gyrus (Rademacher et al., 1993) or may be shifted postero-laterally to occupy an intermediate position occupying variable portions of both gyri (Hackett et al., 2001). This problem is likely to be amplified in moving beyond PAC into putative human belt and parabelt regions, notably the planum temporale (PT). This large region of the posterior temporal plane lies posterior to HG and is contiguous posteriorly with the parietal operculum of the IPL (see Figures 1.1 and 4.2). A considerable body of pathological and structural and functional imaging data on PT has been amassed, largely reflecting its role in language processing (Shapleske et al., 1999; Marshall, 2000), however this region continues to pose considerable problems of anatomical definition (Binder et al., 1996; Westbury et al., 1999). Both HG and PT are potentially subject to several important sources of macro-anatomical variation: intrinsic variation in gyral morphology (Westbury et al., 1999; Morosan et al., 2001; Rademacher et al., 2001), interhemispheric asymmetries (Penhune et al., 1996; Binder et al., 1996; Westbury et al., 1999; Rademacher et al., 2001), and the plastic effects of auditory experience (Schlaug et al., 1995; Pantev et al., 1998; Schneider et al., 2002). Inter-individual variations in the anatomy of HG and PT show correlations with learning disabilities (duplications of HG: Morosan et al., 2001), musical training and absolute pitch (grey matter volume of HG: Schneider et al., 2002; volume and left-right asymmetry of PT: Schlaug et al., 1995; Pantev et al., 1998; Zatorre et al., 1998). The macro-anatomical variability of the human superior temporal lobe imposes an important caveat on functional imaging techniques that seek to establish structure-function relationships in auditory cortex, although the degree of uncertainty in such attributions can be quantified using probability maps (Westbury et al., 1999; Rademacher et al., 2001; Morosan et al., 2001). 19

36 Histology and histochemistry. Cytoarchitectonic and immunohistochemical criteria similar to those used in macaque can reliably define human PAC (Wallace et al., 2002b), and various subdivisions (medial-lateral, rostral-caudal) of the human core region have been proposed (Hackett et al., 2001; Morosan et al., 2001) based on detailed analysis of patterns of myelination, histochemical staining and acetylcholinesterase expression (Wallace et al., 2002b). Morosan et al. (2001) used an observer-independent method based on cell body volume density (quantitative Nissl staining) to define three adjacent (medial-to-lateral) core zones Te1.2, Te1.0 and Te1.1, which may correspond to functional gradients analogous to macaque core areas A1 and R. Surrounding zones of less granular (prokonio- and parakonio-) cortex that are cytoarchitecturally similar to the macaque non-primary auditory fields have been described in the human superior temporal lobe (Galaburda & Sanides, 1980; see Figure 1.1 of this thesis). In the human brain, these non-primary fields extend further laterally and caudally than in the macaque. These regions share common characteristics that distinguish them from the core region (Galaburda & Sanides, 1980; Morosan et al., 2001), however they are anatomically heterogeneous. Cytoarchitectonic homologies have been proposed between macaque and human auditory non-primary fields (Galaburda & Sanides, 1980; Hackett et al., 2001). However, the number, extent and functional significance of such cytoarchitectonic subdivisions remain contentious (Wallace et al., 2002b). While no single cytoarchitectonic or immunohistochemical feature can fully delineate boundaries between cortical areas, the use of complementary histological markers (including parvalbumin and other calcium-binding proteins, acetylcholinesterase, cytochrome oxidase, Nissl and myelin stains: Rivier & Clarke, 1997; Wallace et al., 2002b; Chiry et al., 2003) suggests that human auditory cortex comprises two core areas surrounded by up to six belt areas, represented schematically in Figure 1.1. The distinction between core and belt is more reliably based on the density of acetylcholinesterase staining (higher densities in layer IV of core) than on Nissl or myelination staining. Core and belt also show consistent differences in overall cellularity: PAC contains a dense fibre meshwork, with relatively few cells (Rivier & Clarke, 1997). Distinct receptor fingerprints for core and belt have recently been demonstrated using neurotransmitter autoradiography (Zilles et al., 2002). Though the antero-lateral third of HG has been classified as a core area based on Nissl staining (Morosan et al., 2001), this area has the histochemical characteristics of auditory belt (Wallace et al., 2002b). The PT 20

37 is likely to contain multiple auditory fields (Rivier & Clarke, 1997; Wallace et al., 2002b). This region receives extensive parallel and serial inputs from PAC and the ascending auditory pathways (Galaburda and Sanides, 1980; Pandya, 1995; Rivier and Clarke, 1997; Tardif & Clarke, 2001), and it is intimately related to multimodal areas in the vicinity of the temporo-parieto-occipital junction, which in turn participate in more widely distributed semantic and motor networks that are likely to be crucial for speech (Binder et al., 1997, 2000; Giraud & Price, 2001) and music (Schlaug et al., 1995; Binder et al., 1996) as well as auditory spatial analysis (Baumgart et al., 1999). The functional anatomy of the higher-order auditory areas is likely to be even more variable than the subdivisions of HG. Human auditory cortex extends laterally onto the convexity of STG and probably extends medially onto the insula (Rivier & Clarke, 1997). No consensus exists regarding the histological definition of human auditory parabelt, however auditory cortex on STG (area STA of Rivier and Clarke) has features distinct from other belt areas, compatible with a high order association area (Rivier & Clarke, 1997). Staining for calcium-binding proteins (parvalbumin, calretinin and calbindin) reveals distinct patterns of immunoreactivity between auditory cortical areas and between cortical layers within a given area (Chiry et al., 2003). The laminar pattern of calcium-binding pattern expression reflects inputs from distinct populations of thalamic neurons, while differential gradients of expression beyond PAC have been interpreted as correlates of distinct antero-laterally and postero-medially directed processing streams. At present, the status of particular regions of the human posterior temporal plane beyond PAC within the proposed hierarchy of auditory core, belt and parabelt auditory fields remains uncertain. In particular, detailed anatomical and functional homologies between the human and macaque auditory brain (see Figure 1.1) have yet to be established. In this thesis, a basic distinction is made between primary and non-primary human auditory cortex, but precise homologies between human and macaque auditory fields are not assumed. Interhemispheric differences in the intrinsic architecture of neuronal clusters have been demonstrated in the human posterior temporal plane (Galuske et al., 2000). A number of morphometric and histological hemispheric asymmetries have been described: these include a greater volume of white matter, heavier axonal myelination, larger and more widely spaced cell columns, and greater columnar connectivity in left than in right HG and PT (Seldon, 1981; Morosan et al., 2001; Zatorre et al., 2002a). While such structural 21

38 asymmetries might confer specific functional properties (for example, the fine temporal analysis of speech signals by the left hemisphere), this hypothesis awaits substantiation by detailed electrophysiological study of both single cells and neural networks. Connectivity patterns between and within human auditory fields have been studied using carbocyanine dye tracers such as DiI. Although these studies are not conclusive, they suggest that short-range connections exist within PAC, while more wide-ranging connections exist within surrounding non-primary areas (Tardif & Clarke, 2001). Reciprocal connections link core and belt, and belt with surrounding higher-order cortex (Galuske et al., 1999). Direct (monosynaptic) connections can be demonstrated between HG and anterior areas, but not between HG and posterior areas, suggesting that the latter may occupy a later processing stage (Galuske et al., 1999). Such connectivity patterns could support hierarchical information processing pathways analogous to those proposed in the macaque, in which more complex features are integrated at successive stages of the pathway Neurophysiology Intracerebral recordings. Opportunities for the direct electrophysiological study of human auditory cortex are essentially restricted to the pre-surgical evaluation of patients with refractory epilepsy, and limited information is therefore available. Nevertheless, a hierarchy of auditory cortical fields in human STP can be identified on electrophysiological grounds. Polyphasic auditory evoked responses to clicks and tones with latency to the earliest peak less than 20 ms can be recorded from postero-medial HG, consistent with cytoarchitectonic evidence that this region contains PAC (Liégeois- Chauvel et al., 1991). Evoked potential amplitudes are higher for stimuli delivered to the contralateral than to the ipsilateral ear, and maximal with binaural stimulation, consistent with a binaural cortical representation and a contralateral ear advantage (Liégeois- Chauvel et al., 1991). Electrode mapping has demonstrated a tonotopic organisation of human medial HG analogous to that of macaque PAC (Howard et al., 1996a): higher frequencies are represented more postero-medially, and lower frequencies more anterolaterally. Responses with latencies approximately 50 ms and ms can be elicited from mid and from lateral HG, respectively (Liégeois-Chauvel et al., 1994). A large amplitude long-latency peak at 100 ms can be recorded from PT (Liégeois-Chauvel et al., 22

39 1994). More variable, longer latency auditory responses can be recorded from the lateral posterior STG and parietal operculum (Celesia, 1976; Liégeois-Chauvel et al., 1991), and these responses may persist after resection of the anterior temporal lobe (including HG). Intracortical stimulation of postero-medial HG may evoke responses in lateral HG, PT, postero-lateral STG and anterior STG (Liégeois-Chauvel et al., 1991, 1994; Howard et al., 2000). Evoked responses recorded from postero-medial HG and postero-lateral STG show differential sensitivity to changes in stimulation rate and to general anaesthesia, suggesting that these responses arise in distinct auditory cortical fields. There is some electrophysiological evidence that these fields are connected via a polysynaptic pathway (Liégeois-Chauvel et al., 1991; Howard et al., 2000). Long-latency responses to speech sounds have been recorded from the human STG and lateral temporal lobe (Creutzfeldt et al., 1989; Steinschneider et al., 1999). These electrophysiological observations support a hierarchical organisation of human auditory areas, in which more complex stimulus properties are extracted at successive processing stages in putative belt and parabelt regions. The findings also suggest substrates for both parallel (thalamocortical) and serial (cortico-cortical) processing. Single unit recordings within human HG (Howard et al., 1996a) demonstrate a variety of temporal response patterns (both phasic and tonic) resembling those described in awake non-human primates (Recanzone, 2000a) Non-invasive studies. In addition to the evidence of human cortical microelectrode studies, a substantial body of electrophysiological data has been obtained from scalp recordings of electrical event-related potentials (ERPs) and magnetoencephalography (MEG). Although they possess exquisite temporal resolution, these techniques share the inverse problem of anatomical source localisation. This difficulty is only partly offset by the tangential orientation of the current generators in the STP (Lütkenhöner et al., 2003). Data obtained with such non-invasive electrophysiological techniques nevertheless support human intracerebral recordings and anatomical studies. Simultaneously recorded ERP and MEG data (Yvert et al., 2001) show multiple supratemporal sources corresponding to evoked responses in the middlelatency range (20 80 ms), consistent with multiple auditory cortical fields. The earliest component (the Na-Pa complex: ms) is generally localised to PAC in medial HG, and a cortical origin is supported by lesion studies (Pantev et al., 1995). However, this component is not detectable in all individuals, and alternative correlates of PAC 23

40 activation (for example, the peak first temporal derivative of the MEG response: Lütkenhöner et al., 2003) may be more reliable. The origins of the later middle-latency auditory evoked potentials remain controversial. In the study of Yvert et al. (2001), intermediate components (Nb-Pb1: ms) were localised to lateral STG and the latest component (Pb2: ms) to antero-lateral HG. The intermediate latency components would be consistent with a parallel thalamo-cortical projection or with the cortico-cortical projection from HG to STG suggested by intracerebral recordings (Howard et al., 2000). The morphology of the 80 ms component shows interhemispheric differences (multiple peaks, broad frequency tuning on the right; single peak, sharper frequency tuning on the left) (Liégeois-Chauvel et al., 2001). Such differences have been interpreted as a neurophysiological correlate of functional specialisation in the processing of spectral and temporal properties (Zatorre et al., 2002a). MEG data reaffirm the principle of tonotopy in auditory cortex, demonstrating a relation between computed source depth and frequency (Romani et al., 1982; Cansino et al., 1994; Pantev et al., 1995). Limited MEG evidence (Langner et al., 1997) further suggests an orthogonal mapping of frequency and periodicity. Long-latency (slow) auditory evoked potentials peak later than 80 ms: the major such component occurs at about 100 ms (designated N100 in ERP studies, or N100m in MEG), and generally arises posterior and lateral to the earliest middle-latency component, consistent with a source in PT (Liégeois-Chauvel et al., 1994; Pantev et al., 1995; Krumbholz et al., 2003). Such a localisation would accord well with intracerebral recordings, however the scalp-recorded N100 may represent a superposition of several generators (Liégeois-Chauvel et al., 1994). The latency of N100 varies with the intensity, pitch and spectral composition of the sound stimulus (Krumbholz et al., 2003), while the depth of the N100 source shows a logarithmic relationship with stimulus frequency (Pantev et al., 1995). A mirror frequency mapping for N100 and Pa responses has been demonstrated in simultaneous ERP and MEG recordings (Pantev et al., 1995), further supporting the existence of multiple tonotopic stimulus representations corresponding to distinct cortical areas. The latency range ms includes the peak of the mismatch negativity (MMN) response, elicited by any change in a repetitive auditory stimulus. The MMN occurs preattentively, and has been proposed as a correlate of auditory sensory memory by 24

41 which the auditory cortex maintains a representation of the immediate auditory past (Näätänen et al., 2001). It may also reflect stored auditory templates, including languagespecific phoneme traces (Näätänen et al., 1997): such templates might be used in generic pattern analysis mechanisms that require matching of incoming sensory data with learned representations. Combined ERP and fmri evidence supports an origin for the MMN in non-primary auditory cortex (Liebenthal et al., 2003b), however the locus is sensitive to stimulus attributes, and a right frontal generator may also contribute to the response (Näätänen et al., 2001). Bilateral inferior frontal responses to violation of harmonic expectancy have also been detected using MEG (Maess et al., 2001), consistent with the picture of distributed auditory information processing emerging from anatomical and functional imaging studies. Distinct transient and steady-state MEG responses to auditory stimuli have been described (Ross et al., 2002): such findings suggest possible correlations with the dynamic neuronal responses described in animal electrophysiology (Semple & Scott, 2003). The temporal information provided by electrophysiological techniques is critical in the delineation of processing streams that depend on the sequential activation of auditory areas. Combined electrophysiological and functional imaging studies (Alain et al., 2001; Opitz et al., 2002; Liebenthal et al., 2003b) can potentially exploit the complementary strengths (temporal and spatial) of both modalities Lesion studies Although lesion studies are theoretically critical in establishing which brain areas are necessary to support particular aspects of auditory processing, such studies have played a relatively limited role in the elucidation of human auditory cortical organisation. This reflects the rarity of naturally occurring lesions (generally cerebrovascular accidents) that selectively involve auditory cortex, the binaural nature of auditory cortical representations, a lack of detailed pathological and radiological correlation, and the practical difficulties of assessing auditory cortical functions in the clinical setting. An important distinction in the human lesion literature concerns the method of case ascertainment. Traditionally, this has been based on clinical symptomatology (the symptom-led approach), however the systematic study of patients with particular anatomical lesions (the lesion-led approach) provides important complementary information. The major contribution to the latter group has come from the study of 25

42 patients with anterior temporal lobe resections for epilepsy. This work has allowed a detailed analysis of the effects of relatively selective anterior temporal lobe lesions (with or without involvement of HG) on specific aspects of human auditory processing. In general these studies indicate that auditory cortical areas in right HG and both anterior temporal lobes are critical for the analysis of patterned sound: examples include the computation of basic elements of auditory patterns such as periodicity pitch (Zatorre, 1988), pitch direction (Johnsrude et al., 2000) and rhythm (Penhune et al., 1999); detection of similarities between stimuli and the perception of auditory object properties such as timbre (Samson & Zatorre, 1994; Samson et al., 2002); perception of components of music (Liégeois-Chauvel et al., 1998); and discrimination and recognition of melodies (Zatorre, 1985; Zatorre & Halpern, 1993). There is some evidence that the left medial temporal lobe is relatively specialised for processing anisochrony (Ehrlé et al., 2001). However, clinical considerations dictate that such studies are necessarily anatomically biased, and the contribution from preoperative functional reorganisation is unclear. Clinical disorders of central auditory function can be broadly classified as loss of perception of sound (cortical deafness), disordered perception of sound despite preserved hearing (auditory agnosia), and disordered auditory spatial perception (Polster & Rose, 1998; Simons & Lambon Ralph, 1999; Griffiths et al., 1999b). Cortical (or cerebral) deafness results from bilateral superior temporal lobe lesions, generally with involvement of HG or auditory radiations projecting to HG. In such cases, normal brainstem auditory evoked responses demonstrate that the ascending auditory pathways are intact to the level of the inferior colliculi. Parallel thalamocortical projections may mediate residual auditory perceptual functions in the setting of bilateral damage to PAC (Tramo et al., 1990). Auditory agnosia frequently manifests during the recovery phase of cortical deafness (Mendez & Geehan, 1988): as pure tone perception recovers, deficits in the perception of complex sounds (speech, music or environmental sounds) become apparent. The deficit is rarely if ever confined strictly to one class of sounds (Simons & Lambon Ralph, 1999; Griffiths et al., 1999b), suggesting that the core defect is an impairment of spectrotemporal pattern analysis. There is evidence for impaired fine frequency discrimination in the recovery phase of cortical deafness in patients with lesions involving PAC bilaterally (Tramo et al., 2002), which might contribute to a generalised auditory agnosia. However, relatively isolated deficits such as pure word deafness (Takahashi et al., 1992) and non-verbal agnosia (Eustache et al., 1990; Clarke et al., 26

43 2000) may follow unilateral superior temporal damage, and specific psychoacoustic deficits have been identified in a proportion of such cases (Mendez & Geehan, 1988; Griffiths et al., 1997; Lorenzi et al., 2000; Kohlmetz et al., 2003). A defect in auditory temporal acuity has been described in a number of patients with pure word deafness (Griffiths et al., 1999b; Lorenzi et al., 2000): the majority of these patients have had bilateral or left-sided lesions involving the posterior superior temporal lobe. For most forms of auditory agnosia, the nature of the core defect and anatomical localisation are not well established. In a lesion-led study of patients with left hemisphere damage, the temporo-parietal junction, superior and middle temporal gyri were implicated in the processing of both nonverbal sounds and speech (Saygin et al., 2003). Defects in voice discrimination may follow damage to either temporal lobe, whereas a selective defect of voice recognition (phonagnosia) has been described in association with damage involving the right temporo-parietal junction (Van Lancker et al., 1988). The STG appears to be critical for music processing (Peretz et al., 1994; Ayotte et al., 2000). Bilateral STG lesions may abolish perception of dissonance (Peretz et al., 2001), and selective impairments of timbre (Kohlmetz et al., 2003) and pitch sequence perception (Griffiths et al., 1997) have been described following right anterior and posterior STG lesions, respectively. Observations in individuals with unilateral brain lesions (Peretz, 1990; Liégeois-Chauvel et al., 1998; Ayotte et al., 2000) suggest a musical pitch processing hierarchy that is distributed between the cerebral hemispheres. According to this model, the right hemisphere encodes a global representation of pitch contour that enables the encoding of local pitch information (specific pitch values or pitch intervals) by the left hemisphere. It has been proposed that musical melody and speech prosody are processed by common temporal lobe mechanisms (Patel et al., 1998): whereas deficits in recognition of affective prosody may follow lesions to either hemisphere (more frequently the right), deficits in perception of linguistic prosody appear to be more common after left posterior temporal lesions (Pell, 1998; Mitchell et al., 2003). Although such observations suggest that mechanisms for processing specific intonational features may exist in each hemisphere, the dysprosodias remain both psychoacoustically and anatomically ill-defined. Difficulties with sound localisation may accompany auditory agnosia, generally with bilateral lesions. Few patients present with auditory spatial symptoms (Hausler et al., 27

44 1983; Griffiths et al., 1999b), however lesion-led studies of patients with unilateral right and left temporo-parietal damage have revealed various auditory spatial defects, including auditory neglect or extinction and impaired sound localisation in the horizontal and vertical planes (De Renzi et al., 1984; Haeske-Dewick et al., 1996; Griffiths et al., 1999b; Pavani et al., 2002a; Zimmer et al., 2003). An asymptomatic deficit in sound motion perception based on dynamic interaural cues has been described in a patient with unilateral right temporo-parietal infarction (Griffiths et al., 1996,1997). There is some clinical evidence to support the existence of distinct processing pathways for sound localisation and sound recognition (Clarke et al., 2000): left lateral temporal lobe damage that spares HG and the auditory radiations may impair environmental sound recognition (but not auditory spatial processing), while left-sided parieto-temporal damage that spares the temporal convexity may impair sound localisation (but not sound recognition). However, clinico-anatomical correlation in these cases was imprecise due to their extensive lesions. A further qualification is suggested by the observation that right (but not left) temporal lobe resections anterior to HG produce bilateral sound localisation deficits (Zatorre & Penhune, 2001). The existence of discrete clinical auditory deficits is consistent with anatomical and electrophysiological evidence that the human auditory brain is hierarchically organised. Analyses of deficits in environmental sound (Schnider et al., 1994) and music (Eustache et al., 1990; Ayotte et al., 2000) processing provide support for a model in which lesions of either hemisphere may disrupt perceptual analysis (auditory apperceptive agnosia), while left-sided lesions preferentially disrupt the attribution of meaning to sounds (auditory associative agnosia); the latter is more properly regarded as a semantic rather than a primary auditory disorder (Polster & Rose, 1998; Griffiths et al., 1999b). The majority of clinical cases have extensive lesions, precluding more precise anatomical correlation. A role for the temporo-parietal junction in human auditory spatial analysis is generally supported by lesion evidence. Again, however, there is little consensus regarding the critical anatomical locus or indeed, the role of hemispheric asymmetries in the perception of acoustic space. In addition, the extent to which deficits of human complex sound processing are influenced by supramodal attentional and behavioural factors remains contentious (Griffiths et al., 1999b). 28

45 1.3.4 Auditory functional imaging General considerations. Functional imaging techniques can be used to investigate structure-function relationships in the human auditory brain in vivo, at the level of neuronal populations. The most widely used modalities are positron emission tomography (PET) and fmri. The anatomical resolution and non-invasive nature of these modalities (particularly fmri) offer substantial advantages compared to older methods of investigation. However, the principled application of functional imaging techniques rests ultimately on a detailed knowledge of microscopic anatomy and neuronal electrophysiology. Accordingly, the early development of auditory functional imaging has been heavily influenced by models of auditory cortical organisation derived from other modalities and other species, in particular macaque histology and electrophysiology. Signal changes in fmri reflect the haemodynamic (oxygenation-leveldependent) response to brain activation. The interpretation of fmri signal changes is therefore predicated fundamentally on an understanding of the coupling between neuronal physiology and the haemodynamic response (Logothetis et al., 2001) (see Chapter 2, Section 2.1.4, p. 48). In addition, auditory fmri presents a number of specific technical difficulties (reviewed in Chapter 2, Section 2.1.6, p. 53). Imaging of the ascending auditory pathways is particularly challenging due to the high anatomical resolution required and (in fmri) the problem of brainstem motion caused by basilar artery pulsation. However, activity in the human subcortical auditory system (cochlear nucleus, lateral lemniscus, inferior colliculus, medial geniculate body) has been imaged using PET (Lockwood et al., 1999) and fmri (Giraud et al., 2000). The use of cardiac triggering (image acquisitions synchronised with the cardiac cycle) can reduce image degradation due to brainstem pulsation (Guimaraes et al., 1998; Griffiths et al., 2001). Functional imaging has been used both to validate basic principles of auditory cortical physiology such as tonotopy (Bilecen et al., 1998) and binaurality (Strainer et al., 1997), and to investigate the cortical processing of different classes of complex auditory phenomena, such as speech (Zatorre et al., 1992; Binder et al., 1994, 1996, 1997, 2000; Scott et al., 2000; Giraud & Price, 2001), music (Zatorre et al., 1994, 1996; Platel et al., 1997; Janata et al., 2002), human voices (Belin et al., 2000) and environmental sounds (Engelien et al., 1995; Thierry et al., 2003; Zatorre et al., 2004; Lewis et al., 2004). In contrast to single cell physiology, functional imaging methods do not sample the activity 29

46 of particular cortical fields but rather demonstrate cortical networks engaged in processing particular types of sound pattern. Accordingly, these techniques are well suited to the elucidation of generic mechanisms of sound pattern analysis common to different classes of complex sounds, before those sounds are processed semantically (Griffiths et al., 1998a, 1999a, 2001; Giraud et al., 2000; Thivard et al., 2000; Zatorre & Belin, 2001; Hall et al., 2002; Hart et al., 2003b). Beyond demonstrating coactivation of different brain regions, techniques such as dynamic causal modelling might in principle establish functional connectivities between auditory areas, however the potential of such techniques has yet to be realised, largely reflecting a lack of sufficiently detailed anatomical data (Goncalves et al., 2001). However, the picture emerging from human auditory functional imaging studies has reaffirmed the broadly hierarchical organisation of auditory cortical areas suggested by animal studies and human anatomy and electrophysiology. Emerging fmri evidence suggests distinct (transient and sustained) haemodynamic response patterns localising to different auditory cortical areas (Di Salle et al., 2001; Seifritz et al., 2002a). Such findings demonstrate the potential of fmri to establish physiological properties of different (core and belt) levels within the putative processing hierarchy, although the relationship between those haemodynamic properties and underlying neuronal response properties remains unclear (Zatorre, 2003). This section surveys the functional imaging evidence relating to the processing of different sound attributes and different forms of sound pattern in the normal human auditory brain Pitch and intensity. Pitch is an elemental acoustic attribute (Helmholtz, 1875), however the neural correlates of human pitch perception are both complex and controversial (Krumhansl, 1990; Griffiths et al., 1998a, 2001). The spatial mapping of frequency (tonotopy) is a general organisational principle of the ascending auditory pathways and auditory cortex in animals, and has been demonstrated in functional imaging studies of human PAC (Lauter et al., 1985; Wessinger et al., 1997; Bilecen et al., 1998). Several tonotopic maps have been demonstrated in human STP (Talavage et al., 2000), consistent with animal and human electrophysiological data: two mirrorsymmetric frequency gradients in human HG have recently been delineated using highfield (7 T) fmri (Formisano et al., 2003). However, pure tones are non-physiological 30

47 stimuli: the perception of pitch in natural sounds depends not simply on frequency, but on spectral and temporal structure. These dimensions form the basis for alternative (but complementary) spectral and temporal models of pitch perception (Griffiths et al., 1998a, 2001). Increasing temporal regularity at the scale of milliseconds in delay-and-add noise (which produces an increasingly strong perceived pitch despite uniform frequency composition) activates both subcortical structures (cochlear nuclei and inferior colliculi) and HG (including PAC) (Griffiths et al., 1998a, 2001). For modulated stimuli, tuning properties in the human ascending auditory pathways and HG accord with single unit recordings in animals, and support the concept of a hierarchical filter bank. The progressive reduction in best modulation frequency observed in the ascending pathways would be consistent with integration of temporal information (Giraud et al., 2000). No clear evidence for a periodotopic cortical mapping has emerged in human functional imaging studies, although periodicity may be coded in the haemodynamic response pattern at a given cortical locus (Giraud et al., 2000). Sounds of increasing bandwidth (spectral complexity) produce more extensive activation in human STP. While pure tones activate a restricted region within HG corresponding to PAC or auditory core, intensitymatched band-pass noise (Wessinger et al., 2001) and harmonic tones (Hall et al., 2002) produce additional activation beyond HG which may correspond to human auditory belt. There is some evidence for a topographical representation of intensity ( amplitopicity ) in human HG (Bilecen et al., 2002). The sound-level-dependent growth in extent of fmri activation in HG is approximately linear for high-frequency but non-linear for lowfrequency tones (Hart et al., 2003a). These observations may reflect an interaction between the cortical coding of frequency and intensity. The relationship between intensity and activation (both magnitude and extent) is less consistently observed beyond HG (Jäncke et al., 1998; Brechmann et al., 2002), perhaps reflecting an interaction with other stimulus properties (for example, modulation). 31

48 Simple temporal patterns. Simple temporal sound patterns such as amplitude modulation (AM), FM and spectral motion produce activation beyond HG in areas that may include human auditory belt and parabelt. Relative to unmodulated stimuli, both AM and FM stimuli activate overlapping areas in lateral HG, PT and STG (Hall et al., 2002; Hart et al., 2003b). Continuous shifts in sound intensity level produce activation in PT and posterior STG (Seifritz et al., 2002b). Sounds with changing (compared with stationary) spectral profiles activate areas posterior and lateral to HG in PT and STG (Thivard et al., 2000). For broadband sounds, increasing temporal complexity (presence of AM or FM) engages extensive bilateral superior temporal areas including PT, posterior and anterior STG (Giraud et al., 2000; Hall et al., 2002; Hart et al., 2003b) Sound sequences. Many sounds in the acoustic environment, such as speech and music, are segmented rather than continuously modulated, and this segmentation often occurs over long temporal windows (hundreds of milliseconds). One simple class of examples is the oddball sequence, which has been used extensively in electrophysiological paradigms to investigate correlates of the MMN. Functional imaging studies have shown that preattentive processing of pure tone (Liebenthal et al., 2003b) and chord (Tervaniemi et al., 2000) deviants activates PT and posterior STG: hemispheric lateralisation of the deviant response may be contingent on the type of acoustic information (left-sided for phonemes, right-sided for tones and chords). The detection of intensity deviants is associated with activation of right posterior STG (Belin et al., 1998). The detection of duration deviants (Belin et al., 2002a) and the active comparison of auditory time intervals (Rao et al., 2001) are associated with activation of a subcortical network of areas including thalamus, basal ganglia and cerebellum. Right inferior frontal activation in response to spectral deviants may reflect working memory or attentional modulation (Opitz et al., 2002). The analysis of long-term structure in pitch sequences (pitch variation contrasted with fixed pitch) engages bilateral areas beyond PAC in posterior and anterior STG and planum polare (PP) (Griffiths et al., 1998a; Patterson et al., 2002). Similar areas are activated in processing tonal melodies (Zatorre et al., 1994). Such studies have delineated a network of cortical areas involved in pitch processing, which might be interpreted as a three-stage hierarchy (Patterson et al., 2002): extraction of time interval information in 32

49 the ascending auditory pathways, coding of pitch value and salience in lateral HG, and tracking of pitch information streams (melodies) in STG and PP. Right hemispheric lateralisation emerges at higher levels of melody information processing (Zatorre et al., 1994; Patterson et al., 2002). The earlier stages of this hierarchy, including non-primary or belt areas in lateral HG and PT, are likely to instantiate mechanisms relevant to the generic analysis of long-term temporal structure in sound (Griffiths et al., 1998a). There is evidence to suggest that the analysis of both duration and pitch sequences activates a common superior temporal network (Griffiths et al., 1999a). Working memory for pitch in studies requiring subjects to make decisions about pitch relationships in melodies has been shown to activate a distributed brain network including STG, prefrontal/dorso-lateral frontal cortices, IPL and cerebellum (Zatorre et al., 1994; Binder et al., 1997; Griffiths et al., 1999a; Gaab et al., 2003): while a number of studies have found right lateralised activation, this is not uniformly the case (Rao et al., 2001; Gaab et al., 2003). Working memory for rhythmic patterns engages a frontoparietal network similar to that described for pitch, however hemispheric lateralisation within this network may be modulated by metrical relationships (interval ratios) (Sakai et al., 1999). Activation of PT in tasks requiring the reproduction of rhythmic patterns (Penhune et al., 1998) and in the passive detection of silent periods in sound sequences (Mustovic et al., 2003) implicates this region in the short-term retention of temporal structure. It has been proposed that right IPL encodes temporal information by analysing the accumulation of time intervals (Rao et al., 2001) Speech and music. A large body of data has been collected on the neuroanatomical correlates of speech and music processing (Giraud and Price, 2001; Zatorre et al., 2002a; Scott & Johnsrude, 2003). From the perspective of auditory information processing, both speech and music can be regarded as segmented spectrotemporal patterns with a complex structure at timescales ranging from milliseconds (in the case of some speech sounds) to hundreds of milliseconds (Rosen, 1992; Zatorre et al., 2002a). The patterns of activation described in studies of speech and music processing vary widely in extent. This variability reflects both the acoustic complexity of the speech or musical stimulus (ranging from isolated vowels and chords 33

50 to sentences and extended diatonic melodies) and the extent to which baseline conditions control for this complexity. Speech-like acoustic signals can be created by distorting the speech signal so as to relatively preserve spectral composition (reversed speech: Binder et al., 2000; Crinion et al., 2003), temporal structure (sine wave speech, signal correlated noise, noise-vocoded speech: Mummery et al., 1999; Scott et al., 2000; Liebenthal et al., 2003a) or both spectral and temporal properties (spectrally inverted speech, complex sine wave analogues, speech in noise: Scott et al., 2000; Vouloumanos et al., 2001; Davis & Johnsrude, 2003). Contrasted against simple acoustic baselines such as noise bursts (Zatorre et al., 1992; Binder et al., 1994; Binder et al., 2000) or pure tones (Binder et al., 1996; Binder et al., 2000), speech and speech-like signals (like other forms of modulated broadband sound) produce extensive bilateral superior temporal activation including antero-lateral areas adjacent to PAC in lateral HG, PT and STG (Mummery et al., 1999; Wise et al., 2001; Davis & Johnsrude, 2003). Both speech and speech-like stimuli activate posterior and mid STS, suggesting that this region is involved in acoustic rather than linguistic processing of the speech signal (Binder et al., 2000; Giraud & Price, 2001). However, direct comparisons between different forms of speech distortion have suggested that processing of particular acoustic features of speech occurs in STG bilaterally, whereas surrounding left hemisphere areas including STS are sensitive to speech intelligibility but not to specific acoustic features (Scott et al., 2000; Davis & Johnsrude, 2003). Studies contrasting speech against acoustically complex speech-like baselines have demonstrated left-lateralised or bilateral activation restricted to PT and posterior STG/STS for phonemes (Vouloumanos et al., 2001; Jäncke et al., 2002; Jacquemot et al., 2003; Liebenthal et al., 2003a), while words and sentences activate anterior STS/STG with variable engagement of posterior superior temporal areas (Scott et al., 2000; Giraud & Price, 2001; Crinion et al., 2003). Together these observations suggest that posterior STG and STS abstract phonetic features (Binder et al., 2000) and more anterior areas in STS extract higher order linguistic information (Scott et al., 2000), with relative selectivity of the left hemisphere for verbal processing. This pattern has been broadly mirrored in studies of music processing: right-lateralised activation in anterior and posterior superior temporal lobe areas beyond PAC emerges with increasing spectral abstraction (detection of deviant chords or key changes: 34

51 Tervaniemi et al., 2000; Janata et al., 2002) and with temporal structure extending over increasing time windows (melodies contrasted with fixed pitch sequences: Zatorre et al., 1994; Patterson et al., 2002). It has been proposed that right anterior STG preferentially processes dynamic pitch variation as embodied in music, speech prosody and certain speech-like signals such as spectrally inverted speech (Zatorre et al., 1992; Scott et al., 2000). An asymmetric bilateral network of superior temporal and frontal lobe areas similar to that engaged during perception of music, but excluding PAC, is also active during musical imagery (Zatorre et al., 1996). The tracking of Western tonal relationships in diatonic melodies engages a similar bilateral (rightward-asymmetric) network including PT, posterior and anterior STG, middle temporal gyrus and rostro-medial prefrontal cortex (Janata et al., 2002). This network is also engaged by working pitch memory tasks (Zatorre et al., 1994; Gaab et al., 2003). Limited information suggests that prosodic contours are also processed by a rightward-asymmetric network including superior and middle temporal gyri and frontal areas (Mitchell et al., 2003). Together these findings offer further support for a hierarchical organisation of human superior temporal areas, in which increasingly complex perceptual and semantic features of the acoustic signal are encoded at progressively higher levels of the processing hierarchy. Hemispheric lateralisation may reflect relative specialisation of the cerebral hemispheres for processing particular spectrotemporal characteristics of speech and music, however there is no consensus as to which characteristics are crucial. Hemispheric specialisation might, for example, be based on relative selectivity for spectral versus temporal properties (Zatorre et al., 2002a), or on preferential integration of information over different timescales (Patterson et al., 2002). Alternatively, lateralisation might reflect supramodal mechanisms that construct or access semantic representations based on information derived from different sensory modalities (Binder et al., 2000; Zahn et al., 2000) Sound identity. By analogy with vision and other sensory systems, auditory object processing in humans is likely to be hierarchically organised with extraction of increasingly abstract object properties at successive processing stages (Mesulam, 1998; Husain et al., 2004; Zatorre et al., 2004). A number of functional imaging studies have addressed sound recognition using natural sounds that are likely to engage several levels 35

52 of processing (for example, the analysis of spectrotemporal properties, the abstraction of object features, or semantic associations). Such studies have also used a number of different experimental tasks and acoustic baselines (for example, silence, amplitude modulated noise, spectrally scrambled or filtered natural sounds) that vary widely in the extent to which they match particular spectrotemporal properties of real sound objects. Relatively few studies have addressed generic mechanisms of auditory object computation that might support the identification of many types of natural sounds (Zatorre et al., 2004; Husain et al., 2004). Contrasted against various acoustic baselines, both environmental sounds (Engelien et al., 1995; Humphries et al., 2001; Maeder et al., 2001; Adams & Janata, 2002; Thierry et al., 2003; Beauchamp et al., 2004; Zatorre et al., 2004; Lewis et al., 2004) and human voices (Belin et al., 2000; Belin et al., 2002b; Von Kriegstein et al., 2003) engage a bilateral brain network including PT and temporal lobe areas along the length of STS. A similar network is involved in processing timbre, an object attribute that is crucial for sound identification (Menon et al., 2002). Responses to voices are generally maximal in right mid to anterior STS (Belin et al., 2000; Belin et al., 2002b; Belin & Zatorre, 2003; Von Kriegstein et al., 2003). Speech sounds also activate STS bilaterally, however activation is more symmetrically distributed between hemispheres. Reported asymmetries of hemispheric activation in environmental sound recognition have been inconsistent and may reflect different task requirements (for example, different levels of semantic categorisation; Thierry et al., 2003). The limited evidence concerning auditory object analysis in the human brain (Beauchamp et al., 2004; Binder et al., 2004; Husain et al., 2004; Zatorre et al., 2004) corroborates electrophysiological evidence for an anteriorly and ventrally directed pathway (including homologous regions of STS/STG) for processing sound source identity and con-specific call sounds in non-human primates (Rauschecker & Tian, 2000; Tian et al., 2001; Wang, 2000). The processing stages that constitute the putative human auditory object pathway have not been characterised, although it has been proposed that right anterior STS extracts sound object features before recognition occurs (Zatorre et al., 2004). 36

53 Auditory space. Auditory spatial cues for the localisation of natural (broadband) sounds may be binaural (based on timing or phase differences between the ears) or monaural (based on the filtering properties of the external ears: Wightman & Kistler, 1989, 1998). A number of functional imaging studies (Baumgart et al., 1999; Griffiths et al., 1994; Griffiths et al., 1998b; Griffiths et al., 2000; Lewis et al., 2000; Bremmer et al., 2001; Maeder et al., 2001; Hart et al., 2004) have manipulated binaural cues for sound localisation or movement (such as interaural phase and intensity variation): these cues produce a percept of sound location or movement between the ears (Griffiths et al., 1994), rather than a sound source in external acoustic space. Relatively few studies of auditory spatial analysis have examined brain mechanisms that compute the complex spectrotemporal changes imposed on sound sources by the spatially variable and dynamic filtering mechanism of the external ears (Wightman & Kistler, 1989; Hofman et al., 1998; Weeks et al., 1999; Bushara et al., 1999; Alain et al., 2001; Hunter et al., 2003) (see Chapter 2, Section 2.2, p. 57). Accumulating functional imaging evidence implicates a common distributed human brain network in the spatial localisation of sounds (Weeks et al., 1999; Bushara et al., 1999; Alain et al., 2001; Maeder et al., 2001; Zatorre et al., 2002b) and in the processing of sound movement (Griffiths et al., 1994; Griffiths et al., 1998b; Griffiths & Green, 1999; Griffiths et al., 2000; Lewis et al., 2000; Bremmer et al., 2001; Hart et al., 2004): this network includes bilateral inferior and superior parietal, and dorsal and ventral premotor areas. Direct comparisons of visual and auditory spatial analysis have shown common activation in fronto-parietal areas and modality-specific areas in the superior parietal lobe (Lewis et al., 2000; Bremmer et al., 2001). The variability in premotor and superior parietal activation observed between studies suggests that this fronto-parietal network does not mediate obligatory perceptual processes, but rather attention and motor planning. These areas may encode auditory space in a coordinate system suitable for movement preparation: an analogous function has been attributed to the macaque dorsal processing stream projecting to area PMv (Graziano et al., 1999). No study in which a comparison was made between a spatial sound stimulus and the appropriate control stimulus (non-externalised or stationary sound) has shown activation in medial HG, arguing against the specific involvement of human PAC in sound localisation or the perception of moving sounds. However, a role for human PT in 37

54 auditory spatial analysis has been demonstrated in several studies (Baumgart et al., 1999; Pavani et al., 2002b; Zatorre et al., 2002b; Hunter et al., 2003; Hart et al., 2004). These studies used spectrotemporally complex stimuli with additional cues for externalisation in space (presentation of multiple sources in free field: Zatorre et al., 2002b; convolution with the pinna transfer function: Hunter et al., 2003) or for motion in space (AM: Baumgart et al., 1999, Hart et al., 2004; free-field recordings of a moving sound source: Pavani et al., 2002b). The perception of auditory looming (as generated by shifts in sound intensity) is also associated with PT activation (Seifritz et al., 2002b). This evidence suggests that human posterior superior temporal areas are engaged in auditory spatial analysis and that this engagement may be modulated by computational demands, such as disambiguation of spatial from non-spatial sound properties, single from multiple sound sources, or changing from fixed spatial locations. Such modulation would be consistent both with macaque electrophysiological data indicating a relative (rather than absolute) specialisation of auditory caudal belt fields for spatial processing (Tian et al., 2001), and with human functional imaging evidence that the processing of sound location interacts with spectrotemporal structure (Zatorre et al., 2002b). Limited functional imaging evidence suggests that the insula may also play a role in human sound motion perception (Griffiths et al., 1994). This role may involve cross-modal integration of (auditory and vestibular) information, and would be consistent with human histological data (Rivier & Clarke, 1997). Apart from their intrinsic interest, the neuroanatomical correlates of sound object and spatial processing are central to the auditory what and where controversy, and therefore also speak to the wider question whether certain principles of brain organisation are common to different sensory systems. Functional imaging evidence suggests that separate mechanisms for sound identification and localisation exist in the human brain (Alain et al., 2001; Maeder et al., 2001; Hart et al., 2004), although this separation has been achieved using attentional and stimulus manipulations that differ widely between studies. The functional basis of any What Where processing dichotomy in human auditory cortex therefore remains unresolved. While the potential for interaction between spatial and object information is clear (Alain et al., 2001; Zatorre et al., 2002b), the mechanisms that might support such an interaction are similarly contentious. 38

55 Supramodal and cross-modal processing. Attention both to auditory spatial attributes (Lewis et al., 2000; Alain et al., 2001; Maeder et al., 2001; Zatorre et al., 2002b; Hart et al., 2004) and to non-spatial sound properties such as frequency (Tzourio et al., 1997; Zatorre et al., 1999; Rao et al., 2001), intensity (Belin et al., 1998; Seifritz et al., 2002b) and duration (Rao et al., 2001; Belin et al., 2002a) engages a fronto-parietal brain network comprising dorsal and ventral premotor and superior parietal areas. Auditory vigilance tasks modulate activity in a similar network (Paus et al., 1997). The extensive overlap between these areas and the fronto-parietal areas involved in visuospatial attention (Nobre et al., 1997; Lewis et al., 2000) argues that this network is supramodal: it is likely to mediate directed and selective attention, working memory, motor preparation and response selection according to context (Rao et al., 2001). Engagement of the network modulates auditory cortical (HG and PT) activity under dichotic versus diotic (Hashimoto et al., 2000; Lipschutz et al., 2002) and divided (Lipschutz et al., 2002) listening conditions. In addition to any specific effect of spatially directed attention, such modulation might reflect the increased attentional demands of the task itself. Enhancement of auditory cortex activity during active versus passive listening has been reported (Grady et al., 1997; Hall et al., 2000). However, such modulation has not shown a clear regional specificity for task or attended stimulus features, nor has it been observed consistently across studies and for different acoustic properties (Paus et al., 1997; Tzourio et al., 1997; Platel et al., 1997; Belin et al., 1998). Cross-modal activation of human auditory areas has been described in a small number of functional imaging studies. Both PT and posterior STS are activated by coherent visual motion (Howard et al., 1996b), by transitions between auditory, visual and tactile stimuli (Downar et al., 2000), during silent reading (Price et al., 2003) and during lip-reading of speech and pseudo-speech (Calvert et al., 1997). Audio-visual integration is clearly required for processing linguistic signals, and cross-modal activity in auditory association areas may contribute to pre-lexical phonetic classification (Calvert et al., 1997). Evidence for cross-modal integration of auditory and visual object information in posterior STS has been obtained using fmri in humans (Beauchamp et al., 2004). Human PT and posterior STS may therefore represent homologues of macaque polysensory areas Tpt and STS. Activation of posterior temporal cortex with decreasing temporal predictability of visual sequences (Bischoff-Grethe et al., 2000) suggests that this region may play a generic 39

56 (amodal or cross-modal) role in encoding stimulus probability. Such a role would be relevant to the processing of speech and other complex auditory sequences. 1.4 Key problems in auditory cortical pattern analysis The anatomical and electrophysiological data derived from the macaque and other species suggest a model of auditory cortical organisation in which a hierarchy of structurally differentiated cortical fields route auditory information serially and in parallel between functionally segregated processing stages in the superior temporal lobe and beyond. The physiological characteristics of these processing stages are becoming increasingly defined at the level both of single neurons and neuronal ensembles in behaving animals. This work supports the existence of processing pathways that are at least partly functionally differentiated: integration of information leads to the extraction of increasingly abstract properties at successive processing stages, and cross-modal and multimodal integration occurs at later processing stages. Studies in the macaque and other mammalian species have established a framework on which to test models of human auditory cortex function: indeed, available evidence suggests that the human auditory brain may be organised along generally similar lines. However, both the extent of cross-species homologies and the functional architecture of human auditory cortex remain to be established. The identification and localisation of sound sources and the tracking of acoustic information streams are basic functions of auditory cortex in many species, including humans. Such functions can be considered as computational problems in spectrotemporal pattern analysis. Anatomical and electrophysiological evidence in animals suggests that generic mechanisms of cortical pattern analysis are instantiated in neuronal ensembles and distributed neural networks with properties that are both regionally specific and shared by different species. Accordingly, spectrotemporal pattern analysis is an appropriate level at which to pose fundamental questions concerning the functional linkages between anatomically and electrophysiologically defined auditory cortical areas, such as the existence of distinct What and Where processing pathways. The electrophysiology of cortical neurons is difficult to study in humans, while the mechanisms that underpin complex cognitive operations are often difficult to isolate. 40

57 Functional brain imaging provides a non-invasive and flexible tool with which to probe generic mechanisms of auditory pattern analysis at the neuronal population level in the working human brain. To date, however, this level of processing has been relatively little studied. Many cortical areas in the human superior temporal lobe are engaged both by spectrotemporal manipulations on elementary auditory stimuli and by complex acoustic phenomena such as speech, music and environmental sounds. These observations suggest that generic pattern analysis mechanisms do indeed exist in the human auditory brain, that these mechanisms are anatomically distributed and involve cortical areas beyond PAC, and that regional specificity is relative rather than absolute. Cortical regions such as PT and STS have the characteristics of anatomical and functional hubs. Anatomically, these regions are extensive and have widespread connections to both surrounding and more distant cortical regions, including multimodal cortex (Mesulam, 1998). Such connections could gate auditory information to higher-order areas for further processing (including semantic and symbolic processing), and also integrate top-down modulatory influences on early auditory analysis. Functionally, these areas are implicated both in human lesion and functional imaging studies in the analysis of many types of spectrotemporal pattern and in the cross-modal integration of auditory information with information derived from other sensory modalities. Such regional hubs are therefore excellent candidate sites for generic mechanisms of auditory pattern analysis. A substantial challenge now confronting human functional imaging is the design of hypothesis-led experiments to isolate the mechanisms specific to different stages of multi-component processes such as sound identification and the perception of auditory space. This enterprise will entail a detailed understanding of the functional parcellation of cortical hub regions such as PT. The fmri experiments described in this thesis address four broad unresolved problems of pattern processing in human auditory cortex. These problems relate to generic aspects of auditory scene analysis, namely: i) analysis of the spatial location of sound sources and analysis of non-spatial information associated with those sources; ii) disambiguation of dynamic spectrotemporal patterns associated with sound movement from other complex spectrotemporal patterns; iii) analysis of source-dependent and source-independent pitch information; and iv) identification of auditory objects. The key experimental questions that motivated the fmri experiments can be framed as follows. 41

58 1.4.1 Experiment 1 Do distinct human brain mechanisms analyse the spatial locations of sound sources and the non-spatial information associated with those sources? Do these mechanisms have distinct cortical substrates? It has been demonstrated (Griffiths et al., 1998a; Patterson et al., 2002) that the tracking of pitch information streams is mediated by cortical areas anterior and lateral to HG: such patterns convey spectrotemporal information that is independent of the location of the sound source. In contrast, the neuroanatomical correlates of the spectrotemporal patterns that carry information about source location are disputed (Belin & Zatorre, 2000). Posterior auditory areas including PT may be engaged in sound source segregation using both spatial and non-spatial information (Baumgart et al., 1999; Zatorre et al., 2002b), however the anatomical and functional bases for any separation of processing have not been defined. This issue is central to the current What Where controversy concerning the fundamental architecture of the cortical auditory system both in humans and other animals (see Chapter 3) Experiment 2 Are the human brain mechanisms that analyse sound source motion distinct from those that analyse other kinds of spectrotemporal information? The analysis of sound source motion presents a computationally demanding problem that requires the extraction of specific spatially determined spectrotemporal patterns associated with moving sound sources. Little information is available concerning the cortical mechanisms that process the dynamic spectrotemporal patterns produced by the motion of broadband sounds in space and in particular, the spectrotemporal features imposed by the pinna transfer function. However, human PT is a plausible candidate site (Baumgart et al., 1999; Zatorre et al., 2002b) for mechanisms that disambiguate the spectrotemporal correlates of spatial and non-spatial sound properties, and the spectrotemporal correlates of changing and fixed spatial position (see Chapter 4). 42

59 1.4.3 Experiment 3 Are source-dependent and source-independent pitch properties analysed by distinct mechanisms in the human auditory brain? In terms of the musical pitch helix model (Krumhansl, 1990), pitch information streams such as those of music and prosody represent patterns of pitch chroma values, where a particular pattern can arise from any of a number of different sound sources (the same melody might be produced by different musical instruments or voices). These sourceindependent patterns are analysed in an anteriorly directed cortical network (Patterson et al., 2002). In contrast, a change in pitch height (a change in the identity of the voice or musical instrument) conveys source-dependent pitch information, and might be processed by posterior cortical areas that encode other cues for source segregation (such as spatial location) in early auditory scene analysis. Accordingly, source-dependent (pitch height) and source-independent (pitch chroma) information may be processed by distinct cortical mechanisms located posterior and anterior to HG (see Chapter 5) Experiment 4 Does the human brain possess generic mechanisms for the analysis of sound identity? Does the computation of auditory object features have a specific cortical substrate? Emerging evidence (for example, Belin et al., 2000; Thierry et al., 2003) indicates that STS plays a central role in processing both environmental sounds and human voices, however the level of processing and the nature of the processing mechanisms involved have not been elucidated. Observations that different classes of natural sounds recruit similar neuroanatomical resources, and that particular sound objects can be identified despite wide variation in fine spectrotemporal structure (for example, voiced versus whispered phonemes) suggest that generic mechanisms for extracting auditory object features may exist in STS (see Chapter 6). 43

60 Chapter 2. TECHNIQUES AND METHODS Summary Functional magnetic resonance imaging (fmri) has proved to be a useful and versatile tool for studying regional haemodynamic changes in the working human brain; it is the core technique in each of the experiments described in this thesis. Accordingly, this chapter first outlines the basic principles of MRI, its assumptions and limitations. The physiological basis for the blood-oxygenation-leveldependent contrast and the coupling of the haemodynamic response function with neuronal metabolism are discussed. Sources of noise that affect signal measurement and image quality are discussed. The strategies adopted in order to meet the specific demands of auditory brain imaging, in particular the preparation and delivery of sounds in the fmri environment, are described. The interpretation of functional imaging data rests upon extensive preprocessing and analysis. Conventionally, this includes the application of statistical criteria for assessing the significance and robustness of the haemodynamic changes detected in fmri. The principles of statistical parametric mapping and the software programme (SPM99) used to implement the analyses are outlined. The chapter concludes with a discussion of design issues in fmri experiments. 44

61 2.1 Imaging techniques Compared with other functional brain imaging modalities such as PET, fmri offers the advantages of non-invasiveness, large data sets with many repetitions of experimental conditions, flexible image acquisition characteristics that can be adapted to specific physiological questions, comparatively high spatial and temporal resolution, and the capacity to examine both phasic and state-related neural processes. However, the nature of the technique imposes limitations on the types of information that can be acquired and creates a number of potential sources of artefact that need to be understood in order to interpret fmri data. This chapter therefore first reviews the principles of fmri and technical considerations involved in image acquisition, with special reference to auditory fmri. Technical parameters of the image acquisition sequences used in the experiments described in this thesis are presented in Table 2.1 (overleaf) General principles of MRI MRI exploits the phenomenon of nuclear magnetic resonance (NMR) (Cohen, 1996, 1999; Bandettini & Wong, 1998). Atomic nuclei containing an odd number of nucleons behave like magnetic dipoles with both a magnetic moment and spin: such nuclei will align in the presence of an external magnetic field and precess at a resonant frequency proportional to the external field strength. The frequency-specific transition between energy states (parallel and antiparallel to the external field) induced in a small fraction of nuclei by magnetisation can be measured as emitted energy when the nuclei return to equilibrium. In MRI, the atomic nucleus of interest is most commonly hydrogen in tissue water. There are several fundamental requirements in the production of NMR images. A powerful fixed external magnetic field is required in order to orient tissue nuclei parallel to the field; external field strength is typically 1.5 Tesla (T), where 1 Tesla = Gauss (cf. the Earth s magnetic field = 0.5 Gauss). An oscillating electromagnetic (radio frequency, RF) pulse of specified frequency range and duration is used to rotate some tissue nuclei out of the plane of the fixed magnetic field, and the amount of rotation is a function of the duration of the RF pulse. When the pulse ceases, the RF energy emitted as 45

62 Table 2.1 Characteristics of stimulus delivery, fmri acquisition and image analysis for experiments in this thesis Stimulus delivery characteristics Experiments 1, 2 and 3* Experiment 4 Headphones Sennheiser Modified Koss HE60/HEV70 Stereo ESP950 Medical Ear defenders Modified Bilsom 2452 RS Transducer principle Electrostatic Electrostatic RF isolation RF penetration panel filter on headphone leads fibre optic USB via waveguide; RF filters on amplifier housings External soundcard Edirol UA3 Edirol UA3 Intensity response Linear Hz Linear Hz Frequency response Flat Hz Flat Hz Software Pulsecount ; Cogent2000 Cogent2000 fmri acquisition characteristics MRI scanner Siemens Vision Siemens Sonata Magnetic field 2 Tesla 1.5 Tesla strength Slice thickness 1.8 mm 2 mm Distance factor Slice number Slice orientation Axial -9º Axial -9º Time to echo (TE) 40 ms 50 ms Inter-scan interval 12 s 12.5 s Acquisition interval 4 s 4.32 s Structural MRI sequence Optimised 3-D MP-RAGE Optimised 3-D MP-RAGE

63 Image analysis software Realignment interpolation method Normalisation template image Normalisation interpolation method Spatial smoothing kernel Full-width-at-halfmaximum (FWHM) Convolution Regressors Global intensity normalisation Filtering Image analysis parameters SPM99 under Matlab6 [The Mathworks, Inc.] Sinc In-house echoplanar template based on MNI stereotactic space Bilinear Gaussian isotropic 8 mm Canonical haemodynamic response function Movement parameters determined during realignment Scaled None * Further details available at: db attenuation of maximal output using pure sine tones; maximum output without distortion db for frequency range 125 2kHz In-house software; details available at: Described by Deichmann et al., 2000

64 the precessing nuclei return to equilibrium is detected as an induced voltage in a receiver RF coil. The generation of NMR images depends on the creation of tissue-specific contrasts in the NMR signal. This is achieved by exploiting several tissue NMR properties. NMR signal is proportional to the density of nuclei (proton density) in each tissue (provided their magnetic moments are aligned). In addition, tissue magnetic characteristics determine how rapidly the NMR signal decays. Signal decay depends on both the time taken for the rotated nuclei to realign with the fixed external field (the longitudinal relaxation time, or T1), and the time to develop spin phase differences due to interactions among nuclei with different precessional frequencies (the transverse relaxation time, or T2). If the NMR signal is recorded a short time after the RF pulse, contrast based on tissue-specific T1 and T2 characteristics will be present. However, the effective transverse relaxation time, T2*, is shorter than T2 due to local inhomogeneities in the applied magnetic fields. The NMR signal is conventionally obtained using a spin echo technique (Hahn, 1950). A second echo-forming RF pulse is used to cancel the spin phase differences of the nuclei rotated by the initial excitation pulse, thereby reforming the transverse component of magnetisation with an amplitude that depends on T2. This second RF pulse effectively neutralises the effects of T2* dephasing due to extrinsic inhomogeneities in the fixed external magnetic field (thereby enhancing detection of the small inhomogeneities that reflect tissue magnetisation differences). The time at which the NMR signal echo occurs is the time-to-echo (TE). Although proton density, T1 and T2 characteristics all contribute to image contrast, contrast can be weighted to reflect a particular characteristic by manipulating pulse sequence parameters (in particular, RF pulse amplitude and duration and TE). The spatial and temporal resolution of MRI are limited by the biological properties of the tissue (including proton density and T1) and the characteristics of the scanner and the particular imaging sequence (including field strength and TE); in general, high spatial resolution must be traded against speed of image acquisition. 46

65 2.1.2 Constructing the MR image Three-dimensional NMR images are created by systematically and repeatedly varying field strength in a gradient along each of the spatial dimensions, so that resonant frequency becomes a function of spatial position. The NMR signal is then a complex superposition of different frequencies; the function at each spatial position is resolved using Fourier analysis. The phase difference of the NMR signal (and thus ability to resolve adjacent points in the image) depends on the gradient amplitude and the duration. The difference in frequencies between adjacent points represents the bandwidth of the image. In order to construct a complete three-dimensional image, all combinations of spatial coordinates must be sampled along the different (x, y and z) axes. A planar image is composed on a two dimensional grid in the Fourier spatial frequency domain or kspace, using two orthogonal gradients: a read-out gradient (G x ) that encodes the NMR signal as a function of position along the x-axis; and a phase-encode gradient (G y ) that advances the phase of the signal acquisition line by line along the y-axis using a train of incrementing RF pulses. The pathway through k-space by which data are acquired is the k-space trajectory. In conventional MRI, each phase-encoding pulse corresponds to a single excitation, and the time between successive phase-encoding pulses corresponds to the time-to-repeat (TR). Successive tissue planes are sampled using a slice selection gradient (G z ) perpendicular to the plane of the slice and orthogonal to G x and G y. The application of the RF excitation pulse in the presence of the slice-selection gradient ensures that only protons within a narrow plane (of thickness determined by the bandwidth of the RF pulse) will be on-resonance and thus undergo rotation. The spatial resolution of an MR image is characterised by the size of an individual image volume element (voxel), calculated as the volume imaged (field of view) divided by the number of image points sampled during image acquisition. Voxel size is defined by the number of samples in the read-out and phase-encode dimensions multiplied by the slice thickness; typically, voxel dimensions in fmri datasets are 3 4 mm in-plane with slice thickness 2 10 mm. 47

66 2.1.3 Echo-planar imaging In order to capture physiological processes using fmri, rapid image acquisition is required, and the method of choice is ultrafast echo-planar imaging (EPI) (Mansfield, 1977). Here, a two-dimensional planar image is created using rapidly varying gradients in the transverse plane that multiply refocus the echo from a single excitation pulse (a single-shot technique). In functional applications of EPI, gradient echoes rather than RF pulses (as in conventional MRI) can be used to refocus the NMR signal. Because the RF echo is omitted, the NMR signal is more sensitive to local field inhomogeneities (T2*) including those produced by deoxyhaemoglobin (see below). Gradient EPI is thus wellsuited for the detection of metabolic changes. Gradient echoes are generated using an oscillating gradient along the read-out direction (Logothetis, 2002). The EPI pulse sequence typically follows a zigzag trajectory in k-space that oscillates rapidly along the frequency-encode axis and advances continuously along the phase-encode axis in a series of brief gradient pulses. TE is defined as the time from the excitation pulse to the centre of k-space (Logothetis, 2002), and is generally approximately equal to T2*. Since the NMR signal is only available for a brief period (less than T2*), large gradient amplitudes (to generate sufficient contrast) and rapid gradient switching (to allow complete sampling) are required. EPI requires dedicated gradient hardware for phase encoding, and high speed analogue-to-digital data conversion. Spatial resolution is limited by the maximum gradient amplitude and sampling speed that can be achieved, the reduction in signal-to-noise ratio due to reduced sampling time, and the decay in NMR signal during the read-out phase. The power requirements for generating large gradient amplitudes and the sensitivity of signal detection can be optimised by using local head coils, which both reduce the inductance of the gradient coils and conform to the size and shape of the structure of interest Physiological basis of BOLD contrast Origin of the BOLD response. Functional brain imaging techniques measure accessible physiological processes such as cerebral blood flow and metabolic changes in order to make inferences about neuronal processes that cannot usually be observed directly in human subjects under physiological conditions. These inferences are necessarily provisional, since the coupling between neuronal activity and its metabolic 48

67 and haemodynamic correlates is complex and still incompletely understood. In addition, the magnitude of the changes associated with neuronal activation is small compared to resting neuronal metabolism, global cerebral blood flow and many physiological and technological noise sources. It was suggested in the late 19th Century (Roy & Sherrington, 1890) that regional cerebral activity and local blood flow are functionally coupled, and this relationship was later quantified in animals using deoxyglucose autoradiography (Sokoloff, 1977) and in humans using oxygen-15 radiotracing (Raichle et al., 1976). Under normal aerobic conditions, cerebral oxygen utilisation parallels glucose consumption (Ames, 2000), which supports electrophysiological processes such as transmembrane ion gradients, neurotransmitter economy, and cellular homeostatic processes including membrane maintenance. Via a signalling cascade that has not been completely defined, increases in oxygen and glucose consumption are associated with local vasodilatation and changes in regional cerebral blood flow, volume and oxygenation (Buxton & Frank, 1997): each of these haemodynamic parameters can be measured using fmri. The most common technique exploits the BOLD contrast based on the magnetic susceptibility of haemoglobin. Oxyhaemoglobin contains diamagnetic oxygen-bound iron while deoxyhaemoglobin contains paramagnetic iron: the magnetic susceptibility of haemoglobin is therefore sensitive to oxygen saturation (Pauling & Coryell, 1936). The NMR signal of deoxyhaemoglobin decays more rapidly than oxygenated haemoglobin, giving rise to magnetic susceptibility differences between the haemoglobin-containing vascular compartment and surrounding tissue (Ogawa et al., 1990). Neuronal activation is therefore associated with changes in T2* (BOLD) signal and corresponding regional intensity changes in T2*-weighted images (Ogawa et al., 1992). The relationship between deoxyhaemoglobin concentration and BOLD signal is nearly linear over the physiological range of blood oxygenation (Rees et al., 1997). Based on combined microelectrode and fmri experiments on the visual cortex of anaesthetised monkeys (Logothetis, 2002, 2003; Logothetis et al., 2001), the amplitude of the BOLD haemodynamic response is approximately linearly related to the underlying neural response and reflects mainly neuronal local field potentials rather than spike activity. These field potentials in turn result primarily (though not exclusively) from preand post-synaptic currents associated with incoming inputs and local processing within a 49

68 cortical area rather than output activity per se. These findings are consistent with the known high energy demands of neurotransmitter recycling and synaptic activity (Logothetis et al., 2001). Synaptic excitation appears to be more metabolically demanding than inhibition, and the BOLD response may therefore reflect predominantly excitatory effects (Hart et al., 2002). Strict linearity between neural response, metabolic indices, blood flow and BOLD signal changes cannot be assumed: for example, a nonlinear relationship between BOLD signal change and cerebral blood flow (as measured using PET) has been demonstrated in human auditory cortex (Rees et al., 1997). This has practical implications for comparisons of activation patterns between imaging modalities. In addition, it has been suggested that neural-vascular coupling is nonlinear at low levels of activation (Bandettini & Ungerleider, 2001) The haemodynamic response function. The direction of the BOLD signal change in response to neuronal activation is the result of a complex interaction between cerebral blood flow, blood volume and oxygen utilisation changes, in which blood flow is the dominant factor. The time course of the haemodynamic response function (HRF) has several distinct phases (Logothetis, 2002). A small initial dip is inconsistently found and may reflect the initial increase in oxygen consumption that alters the ratio of deoxyto oxyhaemoglobin (Malonek & Grinvald, 1996). An increase in regional blood flow follows: this increase exceeds requirements for oxygen utilisation, resulting in a decrease in both oxygen extraction fraction and the proportion of deoxyhaemoglobin in veins and capillaries draining the activated brain region (Fox & Raichle, 1986). This decrease in deoxyhaemoglobin concentration is associated with an increase in spin coherence (T2*) and a correspondingly increased NMR signal intensity in voxels containing capillaries with increased blood flow. This increased signal corresponds to the peak in the HRF. Increased blood flow causes vasodilatation and an increase in local venous blood volume which produces a post-stimulus undershoot in the HRF (Buxton et al., 1998). These haemodynamic signal changes are dependent on field strength but the peak BOLD response is typically only 2 5% of baseline at 1.5 T in visual cortex and smaller (of the order of 1 1.5%) in auditory cortex (Talavage et al., 1999). Maximisation of the BOLD contrast-to-noise ratio is therefore an essential requirement of fmri based on this technique. Regional BOLD signal change is generally localised by comparing brain activity in two experimental conditions using a statistical procedure (see below), followed 50

69 by superposition on high resolution structural images that enable anatomical structures to be identified. The BOLD signal change is displaced both in space and time from the neural activation of interest. Spatial displacement results from the geometry of the cerebral microvasculature (Lai et al., 1993), although such displacements are generally not significant under physiological conditions (Turner et al., 1998), and the maximum anatomical resolution of fmri (demonstrated in human visual cortex) is approximately 2 mm (Friston, 1997). A related issue concerns possible contributions to the NMR signal arising from changes in blood flow velocity. Such signal changes may propagate over long distances, and it is therefore essential that the imaging sequence used is insensitive to flow velocity (Turner et al., 1998). More fundamentally, the time course of the BOLD response is at least an order of magnitude slower than that of the underlying neural activation and behaves as a low-pass filter (Friston et al., 1994). Human primary sensory (including auditory) and motor cortices show a latency to peak BOLD signal approximately 4 8 seconds from stimulus onset and return to baseline approximately 5 20 seconds after stimulus cessation (Hall et al., 1999; Belin et al., 1999). The maximum temporal resolution for detection of transient neural events can, however, be of the order of 1 second if event-related experimental designs are used (Friston, 1997). Potential sources of error in modelling the HRF include non-linear interactions between stimulus presentations (this is an issue chiefly at short inter-stimulus intervals: Friston et al., 2000b), variability between subjects (Moelker & Pattynama, 2003) and variability between cortical areas, for which a priori information is usually lacking (Price et al., 1999; Belin et al., 1999) Image quality and sources of artefact Non-biological noise sources. A number of different classes of signal variance contribute to noise in fmr images. These noise sources degrade both the measured BOLD signal and image quality (Cohen, 1996, 1999; Bandettini & Wong, 1998). Inherent in the technique itself are white thermal noise due to ion diffusion in tissues, electromagnetic noise in the receiver coil, preamplifiers and other electronic components, quantisation noise associated with analogue-to-digital conversion, and vibration produced by gradient switching. Several types of spatial image distortion particularly threaten the 51

70 quality of EPI images due to its low bandwidth in the phase-encode dimension (a consequence of continuous phase-encoding during single-shot image acquisition) and relatively long read-out times. In general, these artefacts are more detrimental to individual functional data requiring precise anatomical registration than to spatially normalised group data (Hutton et al., 2002). Chemical shift artefact is a spatial distortion that arises from field inhomogeneities due to variations in the magnetic susceptibility between different tissues (especially fat and water); it can be minimised using a fat saturation pulse to suppress the fat signal prior to the read-out phase (Cohen, 1999). Spatial distortion also arises from inhomogeneities of the applied field: this particularly affects image slices which sample a wide range of field variation (generally, transverse slices in the vicinity of the anterior skull base: Ojemann et al., 1997). These magnetic inhomogeneities may manifest as regional signal dropout, especially affecting the inferior frontal lobes and inferior lateral temporal lobes (Ojemann et al., 1997), and interacting with the effects of head movement. Regional inhomogeneities in the applied field can be reduced by software that optimises scanner shimming, however tissue susceptibility differences may require field mapping and distortion correction during image postprocessing (Hutton et al., 2002). Nyquist ghosting may be produced by induced eddy currents in the gradient coils that generate a magnetic field which persists after the primary gradient is switched off. Eddy current induction is minimised by gradient coil design (Cohen, 1999) and calibration of the timing between signal digitisation and gradient activity to eliminate line-by-line phase discrepancies arising from the oscillating trajectory along the readout axis. Slow drifts in NMR signal over the course of an imaging session may arise from scanner instability (such as fluctuations in coil magnetic characteristics: Friston, 2004) or slow temperature changes (Turner et al., 1998). The effect of such drifts can be reduced by normalising global image intensities or detrending procedures during post-processing Biological noise sources. There are a number of additional biological sources of low frequency noise in fmri time series. These include the cardiac and respiratory cycles, slow endogenous fluctuations in blood oxygenation, low frequency variations in cerebrovascular tone (Meyer s waves) and subject movement (Turner et al., 1998). These noise sources generally scale with field strength, and are not affected by reducing voxel volume. The amplitude spectrum of such noise typically has a 1/f characteristic, with 52

71 both low frequency and wide-band components (Turner et al., 1998; Friston et al., 2000a). Periodic noise sources appear as low frequency artefactual modulation of the signal; even at long TR durations (over 10 seconds), such noise may be aliased across the measurement frequency spectrum (Turner et al., 1998; Friston et al., 2000a). The impact of these noise sources is influenced by regional anatomical variations (for example, basilar pulsation in the brainstem). Biological noise sources are generally addressed during image analysis: low frequency physiological noise can be removed by applying a high-pass filter (Friston et al., 2000a), and the effects of complex subject movement are minimised using motion-correction algorithms (see below) Specific problems in auditory functional imaging Acoustic scanner noise. Perhaps the most important practical problem confronting fmri studies of human auditory function is the acoustic noise produced by EPI sequences. Peak sound pressure levels over 100 db are typical at 1.5 T, most of this noise arising from the read-out gradients and associated resonance and reverberation in surrounding structures (Ravicz et al., 2000; Moelker & Pattynama, 2003). Gradient noise has a complex, broadband spectrum: most energy is concentrated between 250 Hz and 4 khz (Chambers et al., 2001), and spectral peaks occur at the gradient switching frequency and its harmonics in the range 0 3 khz at 1.5 T (Hall et al., 1999; Ravicz et al., 2000). Additional sources of continuous low level, low frequency acoustic noise include the magnet cooling pump and room air-handling systems. At frequencies below 500 Hz, the ear canal is the major route of conduction of environmental noise, whereas at higher frequencies, direct transmission through bone becomes the major route when ear protection is worn (Ravicz & Melcher, 2001). Conventional ear defenders provide approximately db passive attenuation of environmental noise. Active noise cancellation systems may achieve an additional benefit (Chambers et al., 2001), however this benefit remains limited by bone conduction of noise. Scanner noise elicits a BOLD response of variable magnitude ( % in different studies: Moelker & Pattynama, 2003) in PAC and (to a lesser extent) in non-primary auditory cortex. The BOLD response to scanner noise in PAC has a similar time course to the auditory stimuli used in activation studies (Bandettini et al., 1998; Talavage et al., 1999), and the magnitude of the response increases non-linearly with the duration of the acquisition sequence (Talavage et al., 1999). 53

72 Studies in which TR is manipulated to vary the period of scanner silence (Hall et al., 1999; Shah et al., 2000) provide direct evidence that acoustic noise does indeed affect the quality and interpretation of auditory fmri data. There are several mechanisms by which such effects could occur (Moelker & Pattynama, 2003). The presence of background noise will increase auditory activation during the baseline condition (and precludes the use of a silent baseline), making additional auditory activation in response to the experimental manipulation of interest more difficult to detect (Bandettini et al., 1998; Talavage et al., 1999). The relative intensities of activation and baseline conditions cannot be simply scaled since the BOLD response varies with the absolute intensity of the sound stimulus (Jäncke et al., 1998), partial saturation of the haemodynamic response may occur (Bandettini et al., 1998; Talvage et al., 1999), and intermittent masking of the activation stimulus will alter that stimulus unpredictably (Edmister et al., 1999; Belin et al., 1999). In addition, auditory processing itself may be altered by the presence of background noise. Noise may induce inhibition or habituation of the response to the stimulus, and the relative magnitude of this effect in turn may vary between cortical regions (Di Salle et al., 2001) leading to an altered spatial distribution of activation (Edmister et al., 1999). Auditory activity may be modulated by the increased attentional demands of the listening task (Moelker & Pattynama, 2003), and the nature of that task is fundamentally altered to a form of auditory foreground-background decomposition (Scheich et al., 1998). Furthermore, there may be an interaction between task and background noise that is not modelled in the experimental design (Hall et al., 1999) Silent and sparse imaging protocols. Several different strategies have been used in order to reduce interference from acoustic scanner noise (Moelker & Pattynama, 2003). These include the development of quiet acquisition sequences (Belin et al. 1999; Sander et al., 2003) and improved passive and active noise attenuation techniques (Ravicz & Melcher, 2001; Baumgart et al., 1998; Chambers et al., 2001). Alternatively, the interaction between scanner noise and the auditory stimulus can be minimised. This can be achieved by reducing image acquisition time (Belin et al., 1999; Moelker & Pattynama, 2003), however in practice this may require a reduced number of slices. In most conventional ( distributed ) EPI sequences, brain slices within a volume are acquired at equal intervals throughout the TR period. If all brain slices in the volume are instead acquired toward the end of the TR ( clustered acquisition sequences: Edmister et 54

73 al., 1999; Talavage et al., 1999; Shah et al., 2000), a period of silence exists during which the auditory stimulus will be relatively uncontaminated by scanner noise. However, such techniques have a common limitation: if the TR is short (less than approximately 10 seconds) there may still be significant contamination due to the decay phase of the haemodynamic response to scanner noise from the previous acquisition. Methods to compensate for this contamination by subtraction of the BOLD response to scanner noise have not been adequately validated (Moelker & Pattynama, 2003). An alternative approach employs a long TR (inter-scan) interval to achieve maximal separation between the haemodynamic response to the auditory stimulus and the response to acoustic scanner noise from the previous acquisition (Figure 2.1: overleaf). Such sparse temporal sampling techniques (Hall et al., 1999; Belin et al., 1999; Eden et al., 1999; Yang et al., 2000; Di Salle et al., 2001) employ TR intervals longer than the estimated rise time (to within 10% of maximum) and decay time (to within 10% of minimum) of the HRF for human auditory cortex. Sound stimuli are presented during the period between volume acquisitions and all images are acquired at the end of the TR, so that the stimulus and image acquisition cycles are synchronised. Increasing stimulus duration (unlike intensity) does not significantly affect the magnitude of the BOLD response (Jäncke et al., 1999), however the shape of the HRF may be stimulus-dependent (Hall et al., 1999). Typical TR values have been between seconds; these values attempt to account for the range of individual variation in time to peak BOLD response (Hall et al., 1999). A plateau (or slight decline) in the HRF occurs with longer stimulus epochs (Friston, 1997). Practical limitations on the use of very long TRs include the overall imaging time, subject movement, fatigue and loss of attention. Sparse temporal sampling optimises signal-to-noise ratio by maximising the difference between the haemodynamic response to the stimulus and to acquisition noise, and by maximising T2* signal due to greater recovery of magnetisation between excitation pulses (Hall et al., 1999). Compared to continuous acquisition, sparse sampling achieves similar activation maps and generally improved mean signal change using fewer images in the same timeframe (Hall et al., 1999). Although fine temporal resolution is generally lost in sparse sampling protocols (since only a small portion of the HRF near to peak is sampled), the technique can be extended to event-related applications by varying the delay between stimulus presentation and image acquisition (Buckner et al., 1996; Belin et al., 1999; Moelker & Pattynama, 2003). However, sparse protocols generally employ a fixed 55

74

75 Figure 2.1. Principles of sparse fmri data acquisition [modified from Hall et al., 1999] The diagram illustrates the rationale for the use of sparse image acquisition protocols in auditory fmri. Here, a long inter-scan ( TR ) interval (typically seconds) is used to achieve maximal separation between the haemodynamic response to the auditory stimulus (red) and the response to acoustic scanner noise from the previous acquisition (blue). Such inter-scan intervals are longer than the estimated rise time (to within 10% of maximum) and decay time (to within 10% of minimum) of the haemodynamic response function for human auditory cortex. Sound stimuli are presented during the period between volume acquisitions and all images are acquired at the end of the interval; the stimulus and image acquisition cycles are synchronised.

76 temporal relationship between image acquisition and stimulus presentation. In such protocols, phasic components of the haemodynamic response will not be detected (Price et al., 1999): the importance of this limitation in auditory fmri remains uncertain Brainstem motion. An additional specific issue arises in fmri of the ascending auditory pathways. Brainstem motion due to basilar artery pulsation poses a major technical challenge that has been overcome using cardiac triggering to ensure that images are always acquired at the same point in the cardiac cycle (Guimaraes et al., 1998; Griffiths et al., 2001). Using such triggering, activation can be demonstrated in cochlear nucleus, lateral lemniscus, inferior colliculus, and medial geniculate body (Griffiths et al., 2001). 2.2 Stimulus delivery The delivery of high fidelity auditory stimuli in the fmri environment presents a number of challenges. The stimulus delivery apparatus cannot contain ferromagnetic materials, both for reasons of safety and in order to preserve field homogeneity. Pneumatic sound delivery tubes can be used but may distort amplitude, frequency and phase characteristics of the stimuli (Hall et al., 1999; Chambers et al., 2001): this issue is particularly relevant in auditory spatial paradigms (see below). Custom-built systems in which piezo-electric transducers or electrostatic drivers from commercial headphones are placed into commercial ear defenders can achieve flat, stable frequency responses to the region of 20 khz and linear intensity responses without distortion over the usual range of (nonaversive) stimulus intensities less than 100 db (Palmer et al., 1998; Baumgart et al., 1998). The process of incorporating the headphones into ear defenders should not compromise the headphone frequency response or the attenuation characteristic of the ear defenders. Local magnetic field distortions caused by the headphones should be minimal (Baumgart et al., 1998) and RF isolation minimises any interaction between the sound delivery apparatus and the scanner RF coils. A further issue of particular relevance to sparse imaging protocols is the requirement for precise synchronisation between stimulus presentation and image acquisition. Typically this requires a software application that counts digitised RF pulses from the MR scanner and uses these to trigger stimuli 56

77 according to specified timing criteria. Absolute sound pressure levels between 70 and 100 db are optimal for stimulus delivery (Hart et al., 2002). The study of auditory spatial processing using fmri makes particular demands on stimulus design and delivery. The MR environment imposes obvious limitations on the use of speaker arrays in free field such as those used in PET studies of auditory spatial processing (Zatorre et al., 2002b). Auditory motion can be simulated using a variety of binaural cues including interaural phase delays (Griffiths et al., 2000) and binaural beat stimuli (Griffiths et al., 1994), however such motion is perceived within the head rather than localised in external space. Externalised sound sources can be simulated in fmri experiments using a virtual acoustic space technique based on the use of head-related transfer functions (HRTFs) that replicate the spatially varying filtering properties of the external ears (Wightman and Kistler, 1989, 1998). Optimal spatial acuity is obtained using individual HRTFs (Hofman et al., 1998), where stimuli are recorded using microphones placed in the external ear canal and played back to the subject during scanning. Alternatively, generic HRTFs can be used (Wightman and Kistler, 1989): these have lower spatial acuity than individual HRTFs, but still reliably produce a percept of an externalised virtual sound source. Technical characteristics of the stimulus delivery system used in these experiments are presented in Table 2.1. The sound delivery system is shown schematically in Figure 2.2 (overleaf). All stimuli were synthesised digitally in Matlab6 ( examples of scripts used to create the stimuli are presented in Appendix I. During scanning, stimuli were delivered using in-house software (Cogent2000 and Pulsecount ; available at: The design and delivery of stimuli in acoustic virtual space paradigms are described further in Chapters 3 and Image pre-processing Extensive pre-processing of fmri images is necessary before the distribution and magnitude of any regional brain activation can be assessed. The various pre-processing stages are designed to facilitate the statistical comparison of different brain images by locating those images within a common spatial reference frame. In general, the sources of 57

78

79 Figure 2.2. Diagram of stimulus delivery system for experiments This diagram illustrates the key components of the stimulus delivery system used for the experiments described in this thesis. The console formed part of the customised sound delivery system used in Experiments 1, 2 and 3 (further details available at: Technical characteristics of the system components are presented in Table 2.1.

80 signal variance between fmri images can be partitioned into intensity differences that exist when the images are spatially congruent (due either to true physiological activation or to noise sources), and unwanted differences due to spatial non-congruence (Friston et al., 1995a). Within a time-series, spatial non-congruence is attributable to the effects of motion, whereas intrinsic shape variations also contribute when comparing individuals or imaging modalities. These sources of spatial noncongruence are addressed in the preprocessing steps of image realignment and spatial normalisation. Pre-processing in all the experiments described in this thesis was conducted using the spatial pre-processing suite of the statistical parametric mapping software SPM99 ( implemented in Matlab6. The pre-processing parameters used are presented in Table 2.1. The steps involved in the pre-processing and analysis of functional imaging data are summarised in Figure 2.3 (overleaf) Realignment All fmri time series are affected to a greater or lesser extent by subject movement. Motion-related voxel signal changes may be produced by a change in position of the image per se, by interactions between intrinsic signal and static field inhomogeneities (Jiang et al., 1995), or (if TR is short) by variations in magnetisation recovery due to movement between slices in a volume (the spin history effect : Grootoonk et al., 2000). Motion correction is particularly important in EPI, since movement greater than a small fraction of voxel at an object boundary can cause signal changes that are greater than the activation response. If movement is not correlated with the experimental task, the detection of true brain activation will generally be impaired. Conversely, if movement is correlated with the task, signal changes due to motion may be misattributed to activation. These movement-related effects are minimised using procedures that realign sequential images within a time series to a common reference image (usually the first image in the time series). Conventional image realignment methods are based on raw voxel intensity values which are driven chiefly by anatomical information (since signal changes due to functional activation typically represent only a small fraction of the raw NMR signal). A rigid-body affine spatial transformation (a combination of three translations and three rotations) that 58

81

82 Figure 2.3. Steps in pre-processing and analysis of fmri data A, Raw brain images are first realigned to correct for subject movement and session effects, using an algorithm that minimises variance between images. B, Realigned images are normalised to a brain template to transform them into a common stereotactic space and to correct for individual anatomical differences. C, Normalised images are smoothed using a Gaussian filter of specified full-width at half maximum. This step improves signal to noise ratio by increasing overlap between adjacent voxels, with corresponding reduction in spatial resolution. D, Data from smoothed images are analysed using a specified model: this includes convolution with a haemodynamic response function to account for the time course of cerebral blood flow in relation to neuronal activity. E, A design matrix is generated based on the general linear model, rows corresponding to scan number and columns to trials (effects or covariates of interest), with additional columns corresponding to effects or covariates of no interest (e.g., global cerebral blood flow for each subject). A software package (such as SPM99) is used to estimate statistics on the design matrix. The parameter estimates in the column vectors are adjusted mean leastsquares estimates of the effects of interest (discounting effects of no interest); a contrast between experimental conditions is defined by a vector that represents a weighted sum of parameter estimates. Based on the null hypothesis that the effect of interest does not account for more signal variance than could be explained by chance (according to the assumptions of Gaussian random field theory), a t statistic can be derived at each voxel as the ratio of the contrast-weighted parameter estimates to the estimated standard error term for that voxel. The t statistics across brain voxels together constitute a statistical parametric map of brain activation for that contrast. Activations are thresholded at a specified significance level, typically p < 0.05 corrected for the effects of multiple comparisons across the brain volume or for the false discovery rate (Genovese et al., 2002; see text). F, A statistical parametric map (SPM) of the statistic can be plotted as glass brain projections in axial, coronal and sagittal planes or rendered onto a structural template (a canonical brain, group mean MRI, or the subject s own structural MRI) to indicate relationships of activation to brain anatomy.

83 will map each image onto the reference image is first computed by minimising a function of the difference between the original and the target image. The transformation problem can be shown to have a unique least-squares solution if the problem is linearised using the general constraints (which are reasonable for motion less than a few millimetres) that the transformation function is smooth and does not depend on position (Friston et al., 1995a). This least-squares solution generates a six-parameter estimate of the motion associated with each image (Grootoonk et al., 2000). The image must then be resampled ( resliced ) to the new grid coordinates determined by the transformation. This requires an interpolation procedure to generate target image intensity values at locations that do not correspond to data points in the original image. Various realignment protocols have been proposed on theoretical grounds (Woods et al., 1992; Hajnal et al., 1995; Friston et al., 1995a), based on different basis functions that can be used to fit the interpolation. Sinc interpolation offers theoretical advantages because it uses every voxel in the image to determine the new value at an individual voxel and preserves frequency structure across the bandwidth of the original image (Grootoonk et al., 2000). The truncated sinc function, which is windowed over a limited number of voxels (a 9 x 9 x9 voxel Hanning windowed sinc function is used in SPM99), is a useful compromise between accuracy and processing time (Hajnal et al., 1995; Grootoonk et al., 2000). Residual artefacts of the interpolation procedure can be minimised by incorporating the realignment parameters as covariates in the design matrix which models the time-series at each voxel (Grootoonk et al., 2000). This has particular relevance for the removal of task-correlated motion effects Spatial normalisation Due to anatomical variation in the shape of individual brains, a normalisation procedure using non-linear warping is necessary to bring functional data into a common anatomical space. This allows brain activations to be coregistered with a high-resolution structural brain image, enabling functional information to be anatomically localised in an individual brain. It also allows individual data to be grouped for statistical analysis, and assists the anatomical interpretation of functional differences between individuals. A number of different neuroanatomical models have been proposed. These models generate alternative canonical brains based on registered sections of representative individual brains (Talairach & Tournoux, 1988; Toga et al., 1994) or averaged imaging data derived from 59

84 many individuals (Evans et al., 1993; Roland & Zilles, 1994; Mazziotta et al., 1995). Typically the common anatomical brain space is described using Cartesian coordinates; the alternative coordinate systems are related by affine transformations. All coordinatebased neuroanatomical models can be criticised on the grounds that they do not relate functional activation to local structural features. However, alternative representations based on macroscopic landmarks or flattened and unfolded cortical surfaces also pose theoretical and practical problems (Fox, 1995). Various non-linear local deformation algorithms can be used to normalise individual brains to a canonical template. Compared with landmark- (Bookstein, 1995) or contourbased (Joshi et al., 1995) algorithms, approaches using volumetric intensity values (Friston et al., 1995a) offer advantages of generality and maximal exploitation of the large amount of shape information in each volume while remaining computationally efficient. Such algorithms correct for global shape differences between brains, rather than precisely matching cortical features (Ashburner & Friston, 2000). The linearising constraints described above for image realignment can be extended to transformations that incorporate local spatial deformations in three dimensions (Friston et al., 1995a): operationally, the spatial deformations are constrained to consist of a linear combination of smooth basis functions. The estimated deformation parameters are the coefficients of the basis functions that minimise the residual least-squares difference between the image and the reference template. A Bayesian framework can be used to guide estimation of the spatial transformation, by incorporating prior knowledge of the normal variability in brain size and shape (Ashburner & Friston, 2000). These considerations form the basis for the normalisation method (SPM99; used in the present experiments, in which each image in a time-series was matched to the reference template using a two-stage procedure: an initial 12-parameter affine transformation (translations, rotations, zooms and shears), followed by an iterative non-linear estimation of spatial deformation parameters Spatial smoothing In the final stage of pre-processing, normalised brain images are conventionally smoothed by convolution with an isotropic Gaussian kernel of full-width-at-halfmaximum (FWHM) typically 2 3 times the voxel size. This procedure has several 60

85 objectives (Friston, 2004). Signal-to-noise ratio is improved, since most noise is independent for each voxel and therefore has a higher spatial frequency than the haemodynamic changes of interest. Convolution with a Gaussian kernel of FWHM larger than voxel size and approximately the same as the spatial scale of cortical activation (Worsley & Friston, 1995) improves the fit between imaging data and the assumptions of Gaussian field theory used in the subsequent statistical assessment of regional activations (see below). Residual errors are rendered more normal in their distribution, ensuring the validity of parametric statistical tests. Inter-subject averaging is also facilitated, since individual functional variations at spatial scales smaller than the smoothing kernel FWHM are removed. 2.4 Statistical analysis The experiments described in this thesis were all analysed using statistical parametric mapping implemented in SPM99 software ( As a method for assessing regional brain activation, statistical parametric mapping offers the advantages of conceptual simplicity and generality. The signal values at every voxel in a brain image are assumed to have an approximately Gaussian distribution under the null hypothesis of no regionally specific effects. This hypothesis is tested at each voxel using a univariate statistical parametric test, and these statistics are used to construct a statistical parametric map (SPM) representing the regionally-specific probability of the effect of interest. Effects of interest are estimated using the general linear model, and the statistical map is modelled as a spatially extended process using Gaussian random field theory. The theoretical foundations and assumptions of statistical parametric mapping and its implications for the design of functional imaging experiments are discussed in this section The general linear model The statistical tests (for example, the Student s t test) used to analyse functional imaging data are specific instances of a very general method for comparing variance due to an experimental effect with error variance (Friston et al., 1995b). This method, the general linear model, partitions voxel signal intensity values into effects of interest, effects of no interest (or confounds) and error terms. 61

86 The equation for the model describing the time-series for each voxel can be expressed in matrix form (Friston et al., 1995b; Friston, 2004) as SX = SGβ + SHγ + Se where X is the data matrix of signal intensity values, G is a matrix representing the experimental conditions as a linear combination of regressors (or covariates), β is a matrix of parameter estimates for the effects of interest, H is a matrix representing confounds or covariates of no interest (such as movement parameters, global activity, or subject identity in group studies), γ is a matrix of effects of no interest, e is a matrix of error terms and S is a convolution matrix that models the effects of intrinsic serial correlations (the HRF) and extrinsic filtering to remove low frequency noise. G, H and S are specified in the design matrix representing the experimental paradigm. Effects of interest are modelled as functions (most simply, fixed box cars) of the presence of each condition. The parameter estimates in the column vectors of β are adjusted mean leastsquares estimates of the effects of interest (discounting effects of no interest). A contrast between experimental conditions is defined by a vector that represents a weighted sum of parameter estimates. A t statistic can be derived as the ratio of the contrast-weighted parameter estimates to the estimated standard error term for that voxel. In general, it is necessary to take account of serial correlations in fmri time-series due to the latency of the HRF and the presence of low frequency noise sources (Friston et al., 2000a), in order to avoid over-estimating the number of degrees of freedom and biasing estimates of error variance (Petersson et al., 1999). The canonical HRF is modelled as a combination of two gamma functions (Friston et al., 2000a,b). In SPM99, this function is based on empirical parameters derived from functional imaging studies of human visual and auditory cortex. More general formulations of the HRF that accommodate nonlinearities are based on Volterra series with combinations of temporal basis functions (Friston et al., 2000b), however such nonlinearities are chiefly relevant at short interstimulus intervals. Convolution with the HRF is equivalent to smoothing the time-series by applying a low-pass filter. Removal of high and low frequency noise by band-pass filtering (Friston et al., 2000a) optimises both the validity and sensitivity of signal detection in the frequency range imposed by the HRF. 62

87 In sparse acquisition protocols, the long TR effectively removes high-frequency correlations so that low-pass filtering is not relevant, however low frequency drifts and aliased biological noise sources may still create low frequency correlations. These correlations will tend to reduce sensitivity by inflating estimates of residual error variance (Friston et al., 2000a). However, caution is needed in applying a high-pass filter in this situation, since the signal of interest may have appreciable power in the same frequency range as low frequency noise sources (below 1/64 Hz). The effect of low frequency drifts may be reduced by proportionate scaling of signal changes by the mean global signal at each image in the time-series (Eden et al., 1999; Petersson et al., 1999). Randomisation of the sequence of conditions (so that low frequency effects should influence each condition equally) will tend to protect the validity of estimates of effect size in sparse protocols (Belin et al., 1999). Serial correlations in fmri time-series do not affect statistical inference at the second level (random-effects models; see below), since autocorrelation effects are subsumed within the individual subject variance (Friston et al., 2000a) Gaussian random field theory The assessment of the significance of signal variance at each voxel in a brain image requires an appropriate correction for multiple comparisons in order to constrain the number of false positive activations observed. In general, individual voxels do not represent truly independent observations due to spatial correlations among contiguous voxels imposed by both neural architecture and the geometry of the haemodynamic response. Accordingly, a conventional Bonferroni correction (in which the desired false positive rate is simply divided by the number of independent observations) would be inappropriately conservative. Gaussian random field theory provides a method for assessing the significance of spatially extended stochastic processes such as SPMs, based on cerebral blood flow changes (Friston et al., 1995b; Worsley et al., 1996; Friston, 2004). The false positive rate for such a process is related to the number of activated regions (where each region is a volume composed of many voxels) exceeding some threshold, α. Applying this threshold, truly non-activated regions will be erroneously declared active with probability at most α (Genovese et al., 2002). At high thresholds, the number of activated regions is given by a topological measure, the Euler characteristic, which is related to the smoothness (or effective resolution) of the SPM; the p value of 63

88 local maxima of the Gaussian field approximates the expected Euler characteristic (Worsley et al., 1996; Friston, 2004). The significance of an activated region can be assessed in terms of its voxel-level maximum value or its spatial extent (Friston et al., 1995b), both of which have an associated probability of occurring by chance (the false positive threshold). The voxel-level peak height has been more widely used in functional brain imaging applications, reflecting both its relative statistical transparency and greater localising value. It is expressed as a Z score after an appropriate transformation of the SPM t field (it can be shown that this transformation is valid at high degrees of freedom: Friston et al., 1995b). Where a prior anatomical hypothesis exists, the multiple comparisons correction can be restricted to a discrete volume specified by that hypothesis: small volume correction. Concern that the imposition of an arbitrary fixed threshold based on Gaussian random field theory may be inappropriately conservative when applied to functional imaging data has led to the development of alternative statistical procedures for controlling false positives. One such is the false discovery rate correction (Genovese et al., 2002) which controls the proportion of false positives among those tests for which the null hypothesis is rejected (in contrast to alternative procedures that control the rate of false positives among all tests, whether or not the null hypothesis is rejected). The false detection rate correction therefore adapts to the strength of the signal Assumptions The framework of statistical parametric mapping (as implemented in SPM99) makes several basic assumptions (Friston et al., 1995b; Friston, 2004). The general linear model assumes that the error terms are approximately normally distributed with constant variance between conditions (sphericity). Departures from normality are constrained by the convolution procedures during pre-processing, while non-sphericity is constrained by the use of Gaussian random field theory (Friston, 2004), however both assumptions may be violated if the statistical model is mis-specified (Friston et al., 2000a; Friston, 2004). Gaussian random field theory assumes that the number of degrees of freedom is large, and that correlations between different parts of the SPM are constant (stationarity): the former is assured in fmri studies due to the large number of replications of each 64

89 experimental condition, while the latter can be shown to hold empirically (Poline et al., 1995). 2.5 Issues in the design of fmri experiments Type of design The framework of statistical parametric mapping can accommodate various types of experimental design. The simplest type of design is cognitive subtraction, which attempts to isolate a particular cognitive, sensory or motor process by creating two or more conditions that differ with respect to a separable component. Differences in brain activation for the contrast between conditions are attributed to this component process of interest. Cognitive subtraction rests on the assumption of pure insertion, which holds that the components of a process can be added without interaction between them. While this assumption has proved useful in establishing the neuroanatomical correlates of certain elementary sensory processes (Friston, 2004), in general it will not hold for more complex or multi-component processes (Friston et al., 1996; Friston & Price, 2001). Alternative strategies to circumvent the assumption of pure insertion include cognitive conjunctions (which attempt to isolate common components across pairs of conditions that are assumed to share only the component of interest: Price & Friston, 1997; Friston, 2004) and parametric designs (which seek the neuroanatomical correlates of a parametrically varied level of processing, including nonlinear responses: Friston, 2004). However, perhaps the most flexible alternative is the factorial design. Here, two or more experimental factors or manipulations are combined in different permutations across conditions. Factorial designs retain the fundamental simplicity of the subtractive approach, but in addition allow the detection of an interaction between the factors: such an interaction will appear in the contrast between simple main effects (Friston et al., 1996). A further issue concerns the sequence of trial presentation. From a signal-processing perspective, the most efficient design for detecting differences between conditions (yielding the most reliable parameter estimates) is periodic presentation of blocked stimuli with frequency comparable to the HRF and variable inter-stimulus interval (Worsley & Friston, 1995; Friston et al., 2000a; Friston, 2004). However, there are 65

90 compelling neurophysiological and cognitive reasons to prefer non-blocked (randomised or stochastic, event-related) designs in fmri paradigms. These advantages include the minimisation of temporal order effects between conditions, a uniform context for stimulus presentation, avoidance of habituation, preparatory or expectancy responses, and maintenance of attention (Buckner et al., 1996; Friston, 2004). Sparse designs typically have a fixed inter-stimulus interval comparable in length to the HRF. In signal-processing terms, they are relatively inefficient (Friston et al., 1999c) and may be subject to expectancy effects due to the fixed stimulus duration (this can be offset somewhat by jittering the stimulus duration over a small range), although randomisation is easily introduced at the level of trial order Fixed- and random-effects analyses An important issue in both the analysis and design of fmri experiments is the level of inference that can be drawn concerning the observed effects (Friston et al., 1999a). A fundamental statistical distinction exists between fixed- and random-effects analyses, relating to the extent to which the sample of subjects studied is representative of the population at large, where several observations are available for each subject. In a fixedeffects analysis, it is assumed that the variability in activation for the effect of interest is fixed and only scan-to-scan error variance is explicitly modelled (here, the degrees of freedom are approximately equal to the total number of brain volumes). In a randomeffects analysis, the variability in activation between subjects (and scanning sessions) is itself treated as a random variable, and error variance incorporates the variability between subjects (here, the degrees of freedom are equal to n 1, where n is the number of subjects). In general, therefore, a fixed-effects analysis restricts the inferences that can be drawn to the particular subjects included in the experiment, whereas a random-effects analysis enables inferences to be drawn concerning the population from which the subjects were drawn. While the typicality of a certain activation response at the level of the general population may estimated from a fixed-effects model (for example, using a procedure such as a conjunction analysis: Friston et al., 1999a), only random-effects analyses provide quantitative information about the average characteristics of the population (Friston et al., 1999a). The implementation of a random-effects model involves the estimation of contrasts between parameters of interest for each subject at the first level, and the analysis of each of these contrasts as single observations at the second 66

91 level (for example, using a single sample t test). This procedure assumes that the same first-level design matrix is used for each subject (Friston, 2004). A random-effects analysis requires a minimum number of subjects (typically 8 16) sufficient to estimate inter-subject variability reliably. The level of inference must therefore be considered in designing the experiment. Random-effects models have gained wide currency in the analysis of fmri experiments (Nature Neuroscience Editorial, 2001). Subject- (and session-) variability is relatively more important in fmri compared with other functional imaging modalities (notably PET), due to the higher sensitivity of fmri for detecting small scan-to-scan signal changes, the many sources of variability between fmri sessions (including both subject and instrumentation effects) and the typically high scan-to-subject ratio (Friston et al., 1999b). However, an exclusive emphasis on the detection of functional characteristics that are invariant across the group may sacrifice information due to neuroanatomical variability between individuals. This is a particular consideration in human auditory cortex, in which substantial individual variability exists both in the anatomical locations of auditory areas and in the correlation between macroscopic features and cytoarchitecture (Rademacher et al., 2001; Morosan et al., 2001). Even if a particular activation profile is highly consistent between individuals at the level of functional architecture, it might not survive transformation into a fixed macroscopic coordinate system under a random-effects model. On the other hand, activation that is distributed within a large functional region might not achieve a local maximum in an individual subject, but might be detected in a random-effects analysis if the average effect at a particular voxel is sufficiently robust across the population. In the experiments described in this thesis, these issues are addressed using analyses of both group and individual subject data. Analysis at the group level provides information about anatomical regions and functional mechanisms that may be general properties of the human auditory brain. The findings of the group analysis may be strengthened or qualified appropriately by examining the individual data for each subject: the individual SPMs indicate the consistency of activation relative to particular anatomical features in different individuals. Group level random-effects analyses are used in Experiments 1 and 4, which investigate organisational principles of auditory information processing (spatial versus non-spatial processing and generic aspects of auditory object processing, 67

92 respectively) that are likely to be instantiated in anatomical regions that are common to all individuals. Group level fixed-effects analyses are used in Experiments 2 and 3, which investigate specific aspects of auditory spatial (motion) and non-spatial (pitch dimension) processing that might be predicted to have variable anatomical substrates within the broad functional networks identified in Experiment 1. In each case, individual subject data are considered in order to assess group level mechanisms at the level of individual auditory cortical anatomy. 68

93 Chapter 3. ANALYSIS OF SPATIAL AND NON-SPATIAL SPECTROTEMPORAL PATTERNS Summary Perception of the acoustic world requires the simultaneous processing of the locations of sound sources and the non-spatial information associated with those sources. In this fmri experiment (Experiment 1), the human brain areas engaged in the analysis of pitch (a non-spatial sound property) and the analysis of spatial location were investigated in a paradigm where pitch and location could be varied simultaneously and independently. Subjects were presented with sequences of sounds where the individual sounds were regular interval noises with variable pitch. Locations of individual sounds were varied using a virtual acoustic space paradigm. Sound sequences with changing pitch specifically activated lateral Heschl s gyrus, anterior planum temporale, planum polare and superior temporal gyrus anterior to Heschl s gyrus. Sound sequences with changing spatial locations specifically activated postero-medial planum temporale. These results demonstrate that distinct mechanisms for the analysis of non-spatial and spatial information exist in the human auditory brain. This functional differentiation is evident as early as planum temporale: within planum temporale, pitch pattern is processed antero-laterally and spatial location postero-medially. These areas may represent human homologues of macaque lateral and medial auditory belt, respectively. 69

94 3.1 Background The localisation of sound sources and the computation of sound source properties that are independent of spatial location are fundamental processes in auditory scene analysis (Bregman, 1990), however the brain mechanisms that support these processes are poorly understood. The existence of What and Where auditory processing streams in humans and other species is a central issue in contemporary auditory neuroscience (Cohen & Wessinger, 1999; Belin & Zatorre, 2000; Romanski et al., 2000; Zatorre et al., 2002b; Middlebrooks, 2002; Hall, 2003; Cohen et al., 2004). In non-human primates, distinct ventral What and dorsal Where auditory streams have been proposed on electrophysiological grounds (Kaas & Hackett, 2000; Rauschecker & Tian, 2000; Tian et al., 2001; reviewed in Chapter 1, Section , p. 14; see Figure 1.2). In humans, anatomical (Galaburda & Sanides, 1980; Rivier & Clarke, 1997; Galuske et al., 1999; Tardif & Clarke, 2001), functional imaging (Alain et al., 2001; Maeder et al. 2001), electrophysiological (Alain et al., 2001; Anourova et al., 2001) and lesion (Clarke et al., 2000) data are consistent with an antero-lateral auditory cortical What network that analyses non-spatial information streams and sound object features, and a posterior Where network that analyses spatial information (reviewed in Chapter 1, Section , p. 37). However, the extent and functional basis of any such separation of processing remain contentious (Cohen & Wessinger, 1999; Belin & Zatorre, 2000; Romanski et al., 2000; Middlebrooks, 2002; Hall, 2003; Cohen et al., 2004). Both animal electrophysiology (Middelbrooks, 2002; Cohen et al., 2004) and human functional imaging observations (Alain et al., 2001; Zatorre et al., 2002b) suggest that auditory spatial processing is distributed, that regional selectivity for spatial properties is relative rather than absolute, and that spatial processing mechanisms are modulated both by sound object features and by task demands. Representative previous human functional imaging studies of auditory What and Where processing are summarised in Table 3.1 (overleaf). This evidence implicates human PT in the processing of many different types of sound patterns, including both intrinsic spectrotemporal features of sound objects and auditory spatial information. Functional differentiation within such a large and histologically diverse anatomical region is plausible a priori (Recanzone, 2002; Wallace et al., 2002b). The studies summarised in Table 3.1 suggest that distinct postero-medial and antero-lateral functional 70

95 Table 3.1. Representative functional imaging studies of What and Where processing in human auditory areas Study Type Key contrast(s) Side Auditory regions PTO/IPL PT HG STG STS p-m a-l PAC lat post ant post ant Spectrotemporal What patterns Simple patterns Binder et al., 1996 fmri Tone sequences minus words L Griffiths et al., PET Melodic over non-melodic L a pitch sequences R Binder et al., 2000 fmri Tone sequences minus noise L R Giraud et al., 2000 fmri AM minus unmodulated tones L [R ear presentation] R Thivard et al., 2000 PET Spectral motion minus L stationary stimuli R Zatorre & Belin, PET Spectral minus duration change L in tone sequences R Duration minus spectral change L in tone sequences R

96 Hall et al., 2002 Patterson et al., 2002 fmri fmri Harmonic complexes minus pure tones FM minus unmodulated tones Melodies minus fixed pitch Experiment 1 fmri Changing minus fixed pitch of IRN burst L R L R L R L R Maeder et al., 2001 fmri Recognition minus localisation of environmental sounds Thierry et al., 2003 PET Semantic decision on environmental sounds over words Lewis et al., 2004 fmri Recognised minus unrecognised environmental sounds Environmental sounds L R L R L R Belin et al., 2000 fmri Vocal minus non-vocal sounds Von Kriegstein et al., 2003 fmri Identification of voices minus speech envelope noise Voices L R L R

97 Study Type Key contrast(s) Side PTO/IPL PT p-m PT a-l PAC lathg STG post STG ant STS post STS ant Zatorre et al., 1994 PET Melodies minus noise Zatorre et al., 1996 PET Imagery for familiar songs minus visual baseline Tervaniemi et al., 2000 PET Chord sequences containing deviant chords minus standard chord sequences Music L R L R L R Speech Zatorre et al., 1992 PET Speech minus noise L R Binder et al., 1996 fmri Words minus tones L Scott et al., 2000 PET Intelligible speech minus L complex non-speech R Complex pitch variation minus L pitchless stimuli R fmri Speech minus complex nonspeech L R Vouloumanos et al., 2001

98 Wise et al., 2001 PET Linear response to word rate but not signal-correlated noise Speech production minus silent rehearsal L R L R Auditory space Weeks et al., 1999 PET Localisation of noise (virtual space) minus passive listening Bushara et al., 1999 PET Stationary sound L R L R L R L environmental sounds R L simultaneous noises R L R Localisation of noise (virtual space) minus passive listening Alain et al., 2001 fmri Localisation of noise (virtual space) minus pitch judgment Maeder et al., 2001 fmri Discrimination of noise position (interaural time cues) minus Zatorre et al., 2002b PET Covariation with spatial distribution (free field) of Explicit localisation (free field) minus discrimination of noise positions Experiment 1 fmri Changing minus fixed position of IRN burst L R

99 Study Type Key contrast(s) Side PTO/IPL PT p-m PT a-l PAC lathg STG post STG ant STS post STS ant Griffiths et al., 1998b Baumgart et al., 1999 fmri fmri Moving (interaural phase, intensity cues) minus stationary sound Moving (FM, interaural intensity cues) minus stationary sound Griffiths et al., 2000 fmri Moving (interaural phase, intensity cues) minus stationary sound Pavani et al., 2002b fmri Conjunction of azimuthal and vertical sound movement minus stationary sound (free field) Hart et al., 2004 fmri Moving tones (interaural AM) minus stationary tones Experiment 2* fmri Moving minus stationary sound (virtual space) Sound movement L R L NI NI R NI NI L R L R L R L R

100 Studies employing contrasts against a silence baseline have been excluded. * see Chapter 4 of this thesis Key: a-l, antero-lateral; AM, amplitude modulation; ant, anterior; FM, frequency modulation; fmri, functional magnetic resonance imaging; IPL, inferior parietal lobe; IRN, iterated rippled noise; lat, lateral; L, left; HG, Heschl s gyrus; NI, not imaged; PAC, primary auditory cortex; PET, positron emission tomography; p-m, postero-medial; post, posterior; PP, planum polare; PT, planum temporale; PTO, parieto-temporal operculum; R, right; STG, superior temporal gyrus (convexity); STS, superior temporal sulcus; not significant at defined statistical threshold.

101 subregions within human PT may process spatially determined and non-spatially determined spectrotemporal patterns. Such a functional parcellation might be predicted if this region is involved in the early disambiguation of spatial and object properties of sound sources (Zatorre et al., 2002b). However, the functional architecture of human PT has not been established. 3.2 Experimental hypotheses 1. Distinct human brain mechanisms analyse the spatial locations of sound sources and the non-spatial information associated with those sources 2. These mechanisms are instantiated in distinct antero-lateral and posterior auditory cortical networks In this fmri experiment (Experiment 1), sequences of broadband sounds were presented in virtual acoustic space to test the hypothesis that patterns of pitch (a non-spatial sound property) and patterns of spatial locations are processed by distinct cortical mechanisms. Like natural sound sources, such broadband stimuli can be accurately localised in external acoustic space, and their associated pitch and spatial characteristics can be independently varied in a factorial experimental design. It was specifically hypothesised that pitch sequences are processed in a network of areas including lateral HG, PT and PP (Griffiths et al., 1998a; Patterson et al., 2002), while spatial information is processed in a posterior network that includes PT and IPL (Baumgart et al., 1999; Pavani et al., 2002b; Zatorre et al., 2002b; Hunter et al., 2003). It was further predicted that distinct subregions of PT are engaged in processing these different types of information. 3.3 Methods Stimuli Stimuli (see Appendix I) were created digitally in Matlab6 at a sampling rate of 44.1 khz and were based either on fixed amplitude, random phase noise or equivalentpassband (1 Hz 10 khz) regular delay-and-add noise: iterated rippled noise (IRN) (Griffiths et al., 1998a; Yost et al., 1996). Stimuli based on IRN have an associated pitch that can be understood in terms of a temporal model of pitch perception, in which the neural activity pattern in the auditory periphery is correlated with a delayed version of 71

102 itself to extract a pitch corresponding to the reciprocal of the delay in the neural autocorrelogram. The strength of the perceived pitch increases with the number of iterations of the delay-and-add process (Yost et al., 1996). Such stimuli have been used in previous functional imaging studies (Griffiths et al., 1998a, 2001; Patterson et al., 2002) to manipulate pitch salience and to create pitch sequences with structure extending over hundreds of milliseconds. In order to externalise these broadband IRN sounds in virtual acoustic space, the sounds were convolved with generic HRTFs for the left and right ears (University of Wisconsin: Wightman and Kistler, 1989, 1998; see Chapter 1, Section , p. 37 and Chapter 2, Section 2.2, p. 57). Such generic HRTFs do not confer the same spatial acuity as individual HRTFs (Hofman et al., 1998) but nevertheless reliably produce the required percept of a sound source located outside the head, especially in the azimuthal plane. Sounds were combined in sequences where the duration of each individual element was fixed at 250 ms with an inter-sound pause of 75 ms; sequences contained either 25 or 23 elements (total sequence duration 8.05 or 7.4 seconds). The pitch of the IRN stimuli either remained fixed throughout the sequence or was varied randomly and continually among the first six elements of a 10 note octave spanning Hz. Sounds were located at one of four initial spatial positions: 0, 90, 180 or 90 degrees in azimuth. The spatial location of the sound either remained fixed or was varied randomly and continually from element to element. Sequences with changing spatial location were generated from four different combinations of azimuthal positions: the step between successive azimuthal positions could be ±20, 30 or 40 degrees in size, and the order and direction (clockwise or anticlockwise) of steps was randomised. The pitch of the first and last element and the spatial location of the first and last element were constrained to be identical in any given sequence. The experimental paradigm is represented schematically in Figure 3.1 (overleaf). 72

103

104 Figure 3.1. Experiment 1: Schematic representation of paradigm During scanning, four different combinations of sound sequences with fixed pitch or randomly changing pitch ( pitch) and fixed azimuthal location or randomly changing location ( location) were presented. For each sequence, the first and last elements were identical in both pitch and spatial location (0 degrees in azimuth illustrated here: in the experiment, randomised 0, 90, 180 or 90 degrees in azimuth). Each combination of sound sequences corresponded to a different condition with a distinct percept: A, fixed pitch, fixed spatial location; B, changing pitch, fixed spatial location; C, fixed pitch, changing spatial location; D, changing pitch, changing spatial location. Additional conditions used during scanning (not shown) were broadband noise sequences with fixed or changing spatial location, and silence. The use of musical notation here is purely symbolic; pitch variations were random and based on a 10 note octave rather than the Western musical scale. For ease of illustration, short sound sequences with large spatial steps are shown: however the actual sequences used during scanning comprised 23 or 25 elements with steps of ±20, 30 or 40 degrees between successive locations.

105 3.3.2 Subjects Twelve right-handed subjects (five males, seven females) aged 23 to 38 participated. None had any history of a hearing disorder or neurological illness and all had normal structural MRI scans fmri protocol Seven stimulus conditions, each corresponding to a different type of sound sequence and a distinct percept, were presented during scanning (Figure 3.1): 1) IRN with fixed pitch and fixed spatial position (fixed-pitch notes with fixed location in azimuth); 2) IRN with changing pitch and fixed spatial position (changing-pitch notes at a fixed azimuthal location); 3) IRN with fixed pitch and changing spatial position (fixed-pitch notes at a sequence of azimuthal locations); 4) IRN with changing pitch and changing spatial position (changing-pitch notes at a sequence of azimuthal locations); 5) fixed amplitude random phase noise with fixed spatial position (a noise burst at a fixed azimuthal location); 6) fixed amplitude random phase noise with changing spatial position (a noise burst at a sequence of azimuthal locations); and 7) silence. Subjects were pre-tested prior to scanning with examples of stimuli based on each generic HRTF in order to select the HRTF that gave the most reliable percept of an external sound source during scanning. All subjects perceived the stimuli as originating from locations outside the head. In sequences where spatial location varied, the percept was an abrupt jump between consecutive positions. During scanning, subjects were asked to attend to the sound sequences and to fixate a cross piece at the midpoint of the visual axes; there was no active discrimination task. To help maintain alertness, subjects were required to make a single button press at the end of each sequence using a button box positioned beneath the right hand. Stimuli were delivered in randomised order using a custom electrostatic system ( at a fixed sound pressure level of 70 db (see Table 2.1 and Figure 2.2). Following each sound sequence, brain activity was estimated by the BOLD response at 2T (Siemens Vision, Erlangen) using gradient EPI in a sparse acquisition protocol (inter-scan interval 12 s; TE 40 ms) (Hall et al., 1999; see Chapter 2, Section , p. 54). Each brain volume comprised 48 transverse

106 mm slices covering the whole brain with an inter-slice gap of 1.4 mm and in-plane resolution of 3 x 3 mm. 224 brain volumes were acquired for each subject (16 volumes for each condition, in two sessions) Behavioural data Each subject s ability to detect changes in pitch pattern, changes in spatial pattern or simultaneous changes in both types of pattern was assessed psychophysically immediately following scanning using a two-alternative, forced-choice procedure. Subjects listened to pairs of sound sequences where each sequence contained seven elements that varied either in pitch or in spatial location or both simultaneously. The task was to detect a single difference in pitch or spatial pattern associated with changing one element between the members of each pair. Psychophysical test sequences were based on the same pitch and spatial parameters as those used during scanning, and noise-based versions were also included. All subjects could easily detect sequences that differed only in pitch pattern (mean correct response rate 84%), sequences that differed only in spatial pattern (mean correct response rate 78%), and sequences that differed in both pitch and spatial pattern (mean correct response rate 78%). One way analysis of variance did not show any effect of trial type on performance at the p < 0.05 significance threshold Image analysis Image data were analysed for the entire group and for each individual subject using statistical parametric mapping implemented in SPM99 software ( see Chapter 2, Sections 2.3 and 2.4 and Figure 2.3). Scans were first realigned and spatially normalised (Friston et al., 1995a) to the MNI standard stereotactic space (Evans et al., 1993). Data were spatially smoothed with an isotropic Gaussian kernel of 8 mm FWHM. SPMs were generated by modelling the evoked haemodynamic response for the different stimuli as boxcars convolved with a synthetic haemodynamic response function in the context of the general linear model (Friston et al., 1995b). Population-level inferences concerning BOLD signal changes in the contrasts of interest (changing spatial location minus fixed location, and changing pitch minus fixed pitch) 74

107 were based on a random-effects model. The second level t statistic was estimated for each voxel at a significance threshold of p < 0.05 after false discovery rate correction for multiple comparisons (Genovese et al., 2002; see Chapter 2, Section 2.4.2, p. 64). Brain areas activated specifically in the spatial contrast and in the pitch contrast, and brain areas activated in common in both contrasts were identified using masking procedures: these procedures applied logical operators across all voxels in the SPMs corresponding to each contrast. Exclusive masking was used to identify voxels activated by one contrast but not the other contrast, and inclusive masking was used to identify voxels activated in both contrasts. Hemispheric laterality effects for the spatial and pitch contrasts were assessed using a second level paired t test comparing each contrast image with its counterpart flipped about the antero-posterior axis. Local maxima were assessed using a voxel significance threshold of p < 0.05, after small volume correction taking the prior anatomical hypotheses into account. For the main effect of pitch, anatomical small volumes included left and right lateral HG, PP and PT (derived from the group mean normalised structural MRI brain volume) and 95% probability maps for left and right human PT (Westbury et al., 1999). For the main effect of space, anatomical small volumes were based on 95% probability maps for left and right human PT (Westbury et al., 1999). Individual subject data were analysed in order to further assess the anatomical variability of pitch and auditory spatial processing within the group. In the analysis of each individual subject, BOLD signal changes between conditions of interest were assessed by estimating the t statistic for each voxel at a significance threshold of p < 0.05 after small volume correction taking the a priori anatomical hypotheses into account. The anatomical small volumes applied for the spatial and pitch contrasts in the individual analyses were identical to those used in the group analysis. 3.4 Results Group data In the group random-effects analysis, significant activation was demonstrated in each of the contrasts of interest at the p < 0.05 voxel level of significance after false discovery rate correction for multiple comparisons. SPMs for the main effect contrasts are shown in 75

108 Figure 3.2, and local maxima of activation in the STP are listed in Table 3.2 (overleaf). Broadband noise (without pitch) compared with silence produced extensive bilateral superior temporal activation including medial HG (Figure 3.2B, centre: yellow). The contrasts between conditions with changing pitch and fixed pitch (main effect of pitch change) and between all conditions (both pitch and noise) with changing spatial location and fixed location (main effect of spatial change) produced activations involving distinct anatomical regions of the STP. Masking (see Methods, Section 3.3.5, p. 75) was applied to identify voxels activated specifically by change in pitch and change in spatial location, and voxels activated in both contrasts. Pitch changes (but not spatial location changes) produced bilateral activation involving lateral HG, anterior PT and PP anterior to HG, extending into STG (Figure 3.2: blue). Lateral HG activation lay outside the 95% probability boundaries for PAC as defined by Rademacher et al. (2001). In contrast, spatial location changes (but not pitch changes) produced bilateral activation involving posterior PT (Figure 3.2: red). Within PT (Figure 3.2B, centre), activation due to pitch change occurred antero-laterally, whereas activation due to spatial change occurred postero-medially. Local maxima in PT for spatial change were clearly posterior to those for pitch change bilaterally (Table 3.2). For pitch change, additional local maxima occurred anteriorly in right PP and left lateral HG. Although no local maxima occurred in left PP and right lateral HG, these regions were clearly also activated by pitch change (Figure 3.2: blue). Only a small number of voxels within PT were activated both by pitch changes and spatial location changes (Figure 3.2A,B: magenta). No interactions were observed between the pitch and spatial change conditions. The distributions of activation did not differ significantly between cerebral hemispheres for either pitch or spatial processing Individual data Individual subject analyses (using a voxel significance threshold of p < 0.05 after small volume correction) showed activation patterns similar to the group analysis. Pitch change produced local maxima within the pre-specified region (contiguous areas in each hemisphere comprising lateral HG, PT and PP) in 10 of 12 individual subjects. Changing spatial location produced local maxima within the pre-specified region (PT in each hemisphere) in all individual subjects. 76

109 Table 3.2. Experiment 1: Local maxima of activation for group Region Side Coordinates (mm) Z score x y z Pitch change only Planum temporale L R Planum polare R Lateral Heschl s gyrus L Spatial change only Planum temporale L R Data are derived from a random-effects analysis of the group (12 subjects). All local maxima in the superior temporal plane are shown for voxels activated by pitch change but not by change in spatial location (pitch change only) and by change in spatial location but not by pitch change (spatial change only). For all experiments described in this thesis, coordinates of local maxima are in mm after transformation into standard MNI stereotactic space (Evans et al., 1993). Here, a Z score > 3.50 corresponds to p < 0.05 after false discovery rate correction for multiple comparisons.

110

111 Figure 3.2. Experiment 1: Statistical parametric maps for group A, Statistical parametric maps (SPMs) of group data for the contrasts of interest are shown as glass brain projections in sagittal, coronal and axial planes. B, SPMs based on group data have been rendered on the group mean normalised structural brain MRI. For all experiments described in this thesis, brain images are normalised to the MNI standard sterotactic space (Evans et al., 1993). Tilted axial sections are shown at three levels parallel to the superior temporal plane: 0 mm (centre), +2 mm and -2 mm (insets). The 95% probability boundaries for left and right human planum temporale (PT) are outlined (black) (Westbury et al., 1999). Sagittal sections of the left (x = -56 mm) and right (x = +62 mm) cerebral hemispheres are displayed below. All voxels shown are significant at the p < 0.05 level after false discovery rate correction for multiple comparisons. Broadband noise (without pitch) compared with silence activates extensive bilateral superior temporal areas including medial Heschl s gyrus, HG (B, centre: yellow). In the contrasts between conditions with changing pitch and fixed pitch and between conditions with changing spatial location and fixed location, a masking procedure has been used to identify voxels activated only by pitch change (blue), only by spatial change (red) and by both types of change (magenta). The contrasts of interest activate distinct anatomical regions on the superior temporal plane: pitch change (but not spatial location change) activates lateral HG, anterior PT and planum polare (PP) anterior to HG, extending into superior temporal gyrus; while spatial change (but not pitch change) produces more restricted bilateral activation involving posterior PT. Within PT (B: axial sections), activation due to pitch change occurs anterolaterally, whereas activation due to spatial change occurs postero-medially. Only a small number of voxels within PT are activated both by pitch change and by spatial change.

112 3.5 Discussion Auditory What and Where mechanisms in the human brain Using a single fmri paradigm that engaged both types of information processing simultaneously, this experiment has shown that auditory non-spatial What information (pitch sequences) and spatial Where information (spatial sequences) are analysed by distinct auditory cortical networks in the human brain. Pitch sequences are analysed by a bilateral antero-lateral network that includes lateral HG, antero-lateral PT, PP and STG, while spatial sequences are analysed by a bilateral posterior network that includes postero-medial PT. These findings are consistent with proposed dual What and Where processing pathways in the macaque (Kaas & Hackett, 2000; Rauschecker & Tian, 2000; Tian et al., 2001) and evidence for distinct anterior and posterior auditory networks emerging from human anatomical (Galaburda & Sanides, 1980; Rivier & Clarke, 1997; Galuske et al., 1999; Tardif & Clarke, 2001), functional imaging (Alain et al., 2001; Maeder et al. 2001), electrophysiological (Alain et al., 2001; Anourova et al., 2001) and lesion (Clarke et al., 2000) studies. In humans, the anterior network (including PP, anterior STG, STS and middle temporal gyrus) has been implicated in the analysis of many different types of spectrotemporal pattern ( What ), including simple spectral and temporal patterns (Griffiths et al., 1998a; Binder et al., 2000; Thivard et al., 2000; Zatorre & Belin, 2001; Hall et al., 2002; Patterson et al., 2002), musical melodies (Zatorre et al., 1994, 1996), vocal sounds (Belin et al., 2000) and speech (Zatorre et al., 1992; Scott et al., 2000; Vouloumanos et al., 2001; Wise et al., 2001). The posterior network including IPL is active in the spatial ( Where ) analysis of both stationary (Alain et al., 2001; Zatorre et al., 2002b; Hunter et al., 2003) and moving (Baumgart et al., 1999; Pavani et al., 2002b) sounds. The extent of activation of the posterior cortical network has shown considerable variability between studies (Table 3.1). This variability is likely to reflect distinct levels of processing by different components of the network. Engagement of frontal and superior parietal areas is likely to depend on task demands including attention to particular aspects of the stimulus and (covert) motor preparation (Griffiths et al., 1998b; Griffiths & Green, 1999; Baumgart et al., 1999; Bushara et al., 1999; Weeks et al., 1999; Griffiths et al., 2000; Lewis et al., 2000; Alain et al., 2001; Maeder et al., 2001; Zatorre 77

113 et al., 2002b; Pavani et al., 2002b; Hart et al., 2004). A common fronto-parietal network has been recruited in a variety of studies where modality-specific and supramodal attentional demands were manipulated (see Chapter 1, Section , p. 39). The absence of activation in these areas in the present experiment is therefore likely to reflect the absence of an active discrimination task. Engagement of the posterior superior temporal lobe and in particular PT is likely to be modulated by computational demands associated with obligatory perceptual processing (for example, the presence of monaural versus binaural cues, or multiple discrete sound sources). The present experiment was predicated on the use of a broadband sound source and the availability of both binaural and monaural spatial cues, however no further assumptions were made concerning the encoding of spatial position by human auditory cortex. The use of a virtual space technique here has allowed the simulation of a discrete external sound source with independently varying intrinsic spectrotemporal properties that were not subject to a directed attentional or behavioural task. The present work therefore allows more specific conclusions regarding the respective roles of the antero-lateral and posterior auditory cortical networks in the perceptual analysis of sound source properties: these networks instantiate distinct mechanisms that are simultaneously and specifically engaged in processing non-spatial and spatial properties of sound sequences. A further issue concerns the extent of hemispheric asymmetry in the processing of spatial and non-spatial auditory information. Bilateral activation of the hemispheric networks that process auditory spatial and pitch sequences is evident in the present study (Figure 3.2). For both pitch sequence and spatial sequence processing, the distributions of activation did not differ significantly between the left and right cerebral hemispheres. Previous functional imaging studies of auditory spatial processing in humans have shown bilateral (Pavani et al., 2002b; Zatorre et al., 2002b) or right-lateralised (Baumgart et al., 1999) activation of PT and parietal lobe areas. While the processing of pitch sequences and chords is more frequently right-lateralised in non-primary superior temporal lobe areas including PT, PP and STG (Zatorre et al., 1994; Tervaniemi et al., 2000; Patterson et al., 2002), the extent of such rightward asymmetry varies between studies (Table 3.1). Similar caveats apply to the human lesion literature. Deficits of auditory spatial processing have been reported following lesions of either hemisphere (for example, Griffiths et al., 1996, 1997; Zatorre & Penhune, 2001; Clarke et al., 2000), while deficits in various aspects of pitch processing more commonly follow right hemisphere lesions 78

114 (for example, Zatorre, 1988; Griffiths et al., 1997; Johnsrude et al., 2000). However, the frequently superadditive effects of second lesions in the opposite hemisphere (Peretz et al., 1994) illustrate that this hemispheric asymmetry is relative rather than absolute. The pattern of hemispheric lateralisation for both spatial and non-spatial processing might be modulated by computational demands (temporal integration over short versus longer time windows), the behavioural relevance of that information (spatial attention, motor preparation, working memory) and by the availability of both spatial and non-spatial information in the acoustic stimulus The role of the planum temporale The present experiment has demonstrated that the analysis of both pitch sequences and acoustic spatial sequences involves PT, and that these different types of auditory information are analysed at different sites within PT: pitch information is processed antero-laterally, while spatial information is processed postero-medially. This functional differentiation is consistent with the anatomical features of human PT (its histological diversity and extensive cortical connections; see Chapter 1, Section 1.3.1, p. 18) and its demonstrated involvement in diverse forms of spatial and non-spatial spectrotemporal pattern processing (Table 3.1). Similar functional parcellation is not evident in medial HG, the site of PAC (Rademacher et al., 2001). The present findings are nevertheless compatible with electrophysiological and lesion evidence in animals (Heffner & Heffner, 1986a, 1990b; Ahissar et al., 1992; Toronchuk et al., 1992) indicating that PAC contains neurons sensitive to spatial parameters: particular neuronal subpopulations in PAC or subcortical structures (Rauschecker & Tian, 2000) might provide inputs to higher auditory cortical areas involved in spatial or non-spatial processing. In the macaque, the posterior temporal plane has been implicated in the analysis of sound source location (Leinonen et al., 1980; Recanzone, 2000b), and proposed as the origin of an auditory dorsal stream for processing spatial information (Rauschecker & Tian, 2000). Furthermore, functionally distinct medial (CM) and lateral (CL) belt areas have been described in the macaque posterior STP (Tian et al., 2001; see Figures 1.1 and 1.2). The present work demonstrates that in humans the processing of spatial and non-spatial sound properties diverges beyond PAC and as early as PT, and suggests an analogous functional differentiation of the human posterior temporal plane. These functionally distinct subregions may correspond to the cytoarchitecturally distinct regions Te2 (medial) and 79

115 Te3 (lateral) identified in human anatomical studies (Morosan et al., 2001; Figure 1.1) and the segregated N1m source generators identified in the human posterior STP during spatial and pitch working memory tasks (Anourova et al., 2001). Human PT may represent the cortical origin of a posteriorly directed auditory spatial processing pathway, homologous to the dorsal auditory stream proposed in the macaque. The functional subregions of human PT identified here have distinct communications with other cortical areas. Acoustic spatial patterns are processed in a well-defined region of the posterior STP, while the areas that process non-spatial (pitch) patterns are distributed along the antero-posterior axis of the superior temporal lobe, including both the postero-lateral temporal plane and anterior auditory areas. This anterior-posterior distribution of non-spatial spectrotemporal processing is also consistent with macaque electrophysiology (Tian et al., 2001). Specifically, in the macaque object selectivity (defined using a range of spectrotemporally complex animal calls) is present in both anterior and posterior belt areas, but is shown by a smaller proportion of neurons in the posterior belt area CL. A certain subpopulation of neurons in the macaque caudal belt area CL responds both to the spatial location of complex sounds and to specific call sounds. Taken together with this emerging picture in the macaque, the present findings therefore suggest that, in humans as in non-human primates, different subregions of the posterior STP contain specific mechanisms for processing the spatial and non-spatial properties of complex sounds, and these mechanisms access distinct higher cortical areas. However, any anterior-posterior (ventral-dorsal) demarcation of processing on anatomical grounds must be qualified A synthesis? The controversy surrounding the existence of dual What and Where human auditory processing streams (Middlebrooks, 2002) was a major motivation for the present work. No account has satisfactorily reconciled the evidence, on the one hand, for a duality of auditory information processing streams and on the other, for their mutual interdependence (Middlebrooks, 2002; Zatorre et al., 2002b). The processing of auditory spatial information imposes a large computational burden on the auditory system, since acoustic space must be reconstructed by combining cues arriving at both ears and modified both by the complex shape of the external ears and by intrinsic spectrotemporal 80

116 attributes of the sound source. The present experiment provides evidence that this critical computational step occurs in human PT, and suggests that human PT plays a critical role in gating spatial and non-spatial auditory information for further processing in distinct posterior and antero-lateral cortical networks. This work suggests a separation of human cortical mechanisms for processing spatial and non-spatial auditory information. Such a separation must be reconciled with other human functional imaging and electrophysiological data suggesting interaction or cross-talk between the two processing streams (Alain et al., 2001; Zatorre et al., 2002b) and with evidence that sound localisation in humans may be disrupted by lesions involving the anterior temporal lobe (Zatorre & Penhune, 2001). This experiment has identified distinct brain substrates for the perceptual processing of auditory spatial and non-spatial information. However, these findings do not establish the behavioural relevance of those substrates, nor how the two mechanisms interact to allow a behavioural response (as in an active discrimination task). Such an interaction is of course compatible with dual processing streams, where a unified behavioural response is produced as a result of cross-talk between the streams (Alain et al., 2001): the response might depend on the integrity of both cortical networks (Zatorre & Penhune, 2001). More fundamentally, the differential activation of the spatial network will depend on differential sensitivity to the spatial cues available. If, for example, any given location in auditory space is coded by a widely distributed neuronal population, activation may not vary with spatial distribution per se (Zatorre et al., 2002b). In addition, the processing of spatial information may be influenced by auditory object features (Zatorre et al., 2002b): potential higher-order interactions between different types of spatial and non-spatial auditory information were not addressed in the present experiment, since spatial cues and auditory object features were not manipulated. The regional selectivity of the human auditory brain for spatial and non-spatial stimulus attributes might be most parsimoniously considered as relative, rather than absolute. This position is supported by electrophysiological data in nonhuman primates (Tian et al., 2001; Cohen et al., 2004) and in other mammalian species such as the cat (Stecker et al., 2003). This experiment raises a number of further issues concerning the functional organisation of human auditory cortex: these issues are addressed in the subsequent experiments described in this thesis. The spatial specificity of the posterior cortical mechanism and the 81

117 functional parcellation of human PT are further investigated by comparing moving and stationary sound source processing (Experiment 2; Chapter 4). The nature of auditory non-spatial ( What ) processing and the possibility of separate mechanisms within the distributed antero-lateral What network are examined by comparing the processing of source-dependent and source-independent pitch information (Experiment 3; Chapter 5), and different levels of sound object analysis (Experiment 4; Chapter 6). 82

118 Chapter 4. ANALYSIS OF SOUND SOURCE MOTION Summary In this fmri experiment (Experiment 2), the human brain network responsible for processing sound source movement was examined using a virtual acoustic space paradigm. Simulated sound source motion was contrasted against two baseline conditions: externalised sound sources at fixed locations in virtual space, and non-externalised sounds with similar spectrotemporal structure to the moving sound. Sound source motion produced additional bilateral activation in the postero-medial planum temporale and adjoining parieto-temporal operculum. This posterior network is specifically implicated in the perceptual processing of sound source motion in space rather than dynamic spectrotemporal complexity per se. The findings support the existence of a posterior processing pathway in the human brain that specifically analyses auditory spatial (motion) information, and further define postero-medial and antero-lateral functional subdivisions within human planum temporale that analyse auditory spatial and non-spatial information, respectively. Based on the evidence of Experiments 1 and 2, it is proposed that human planum temporale behaves as a computational hub that disambiguates spatial and non-spatial auditory information for further processing in distinct cortical pathways. 83

119 4.1 Background Sound movement is an important aspect of our perception of the environment, and is the only sensory cue available for the perception of movement of objects in the large region of space behind the head. Human functional imaging studies using moving sound stimuli (Baumgart et al., 1999; Griffiths et al., 1994; Griffiths et al., 1998b; Griffiths & Green, 1999; Griffiths et al., 2000; Lewis et al., 2000; Bremmer et al., 2001; Pavani et al., 2002b; Hart et al., 2004; see Table 3.1) have demonstrated activation in bilateral inferior and superior parietal areas, ventral premotor areas and the frontal eye fields. Several previous functional imaging studies have shown activation of PT during sound movement processing (Baumgart et al., 1999; Lewis et al., 2000; Pavani et al. 2002b; Hart et al., 2004; Table 3.1). Such observations provide further evidence that spatial sound attributes are analysed in a posterior auditory pathway that passes from PAC to PT and thence to IPL, representing the human homologue of the dorsal auditory spatial processing stream in the macaque (Rauschecker, 1998; Rauschecker and Tian, 2000; Figure 1.2). However, no previous functional imaging study in which a comparison was made between a moving stimulus and the appropriate control stimulus (stationary sound) has shown activation in the region of PAC in medial HG (Penhune et al., 1996; Rademacher et al., 2001). This argues against the specific involvement of PAC in sound movement perception in humans. Demonstration of neurons sensitive to cues for auditory motion in primary auditory cortex of cats and monkeys (Ahissar et al., 1992; Toronchuk et al., 1992) does not invalidate this conclusion, since such neurons may provide part of the input to movement-specific areas. A priori there are a number of computational steps necessary for a sound to be perceived as moving in space. The filter functions of the two external ears impose dynamic spectrotemporal changes that must be disambiguated both from the properties associated with a particular location in external space and from the intrinsic spectrotemporal properties of the sound source (Wightman & Kistler, 1989; Hofman et al., 1998). The computational role of human auditory areas in these processes has not been clarified by previous functional imaging studies. Baumgart et al. (1999) used a limited number of slices that did not allow a demonstration of the entire motion analysis system, while Lewis et al. (2000) used a silent reference condition that does not allow conclusions to be drawn about specific movement analysis mechanisms. While Pavani et al. (2002b) used 84

120 individual free-field recordings from a speaker array to simulate sound source motion in the vertical and horizontal planes, Baumgart et al. (1999), Lewis et al. (2000) and Hart et al. (2004) used acoustic stimuli that relied on the manipulation of binaural cues to produce the perception of movement of a sound source between the ears, rather than acoustic stimuli that would be produced by actual sounds in space. The present fmri experiment (Experiment 2) incorporated two novel strategies not employed in previous functional imaging studies of sound motion processing. A virtual acoustic space technique (Wightman & Kistler, 1989, 1998) was used to produce the percept of a single sound source moving around the head, and the dynamic spectrotemporal properties of the stimulus were manipulated to assess the spatial specificity of sound motion analysis mechanisms. 4.2 Experimental hypotheses 1. The spatial analysis of moving sound sources engages a posterior cortical mechanism in the human brain: the substrate for this mechanism is similar to the brain substrate for the analysis of stationary sounds in space, but distinct from mechanisms that analyse non-spatial dynamic sound properties 2. The mechanisms that process sound motion and non-spatial dynamic sound properties are instantiated by distinct functional subregions in human PT Based on previous human functional imaging data (see Table 3.1) and the results of Experiment 1 (Chapter 3, Section 3.4, p. 75), it has been proposed that a posterior human brain network is engaged in acoustic spatial analysis. In this fmri experiment (Experiment 2), specific brain correlates of sound motion analysis were investigated by comparing the processing of continuously moving and stationary virtual sound sources. It was predicted that continuous sound motion would activate the posterior network engaged in the processing of static sources (Experiment 1). In addition, the present experiment assessed the spatial specificity of this posterior network by comparing the processing of the moving sound source with the processing of a spectrotemporally matched control sound that was not externalised. Since the computation of a sound source trajectory depends on the analysis of dynamic spectrotemporal patterns, it was hypothesised that moving sounds should specifically activate the network when compared with non-spatial spectrotemporal patterns of comparable dynamic complexity. 85

121 Based on the findings of Experiment 1 and previous observations in the human functional imaging literature (Table 3.1), it was further specifically predicted that distinct posteromedial and antero-lateral subregions of PT are engaged in processing sound motion and non-spatial dynamic patterns, respectively. 4.3 Methods Stimuli Stimuli were created digitally in Matlab6 at a sampling rate of 44.1 khz and were based on fixed amplitude, random phase noise (passband 1 Hz 20 khz). The noise was sinusoidally amplitude-modulated at 80 Hz (modulation depth 80%) to provide an additional cue for spatial location (Henning, 1973) and to reduce any habituation to the stationary stimuli. Noise was then convolved with generic HRTFs for the left and right ears (University of Wisconsin; Wightman & Kistler, 1989, 1998; see Chapter 1, Section , p. 37 and Chapter 2, Section 2.2, p. 57) to simulate a single sound source in virtual acoustic space. The use of fixed HRTFs corresponding to one spatial location allowed the simulation of a sound source at a fixed point in space and the use of dynamically updated HRTFs allowed simulation of movement of the sound source in azimuth. While the generic HRTFs used here do not allow the same spatial acuity as individual HRTFs (Hofman et al., 1998), they nevertheless reliably produce the required percept of a stationary sound source located outside the head or rotating about the head with fixed angular velocity. Here, the moving sound stimulus rotated about the head in the azimuthal plane with constant positive angular velocity of 320 degrees/second. Stationary external control sounds were located either in the midline ( midline stimuli, azimuth = 0 or 180 degrees) or located to the right or to the left of the head ( lateralised stimuli, azimuth = 90 or 270 degrees). A spectrotemporal control sound was generated by taking the mean of the waveforms at each ear after convolution with the HRTF in the rotation condition and presenting this stimulus diotically. This stimulus has a similar spectrotemporal structure to the rotating stimulus but produces a midline percept within the head. The mean waveform rather than the waveform at either ear alone was used to avoid monaural cues for movement perception (Zakarauskas & Cynader, 1991). The spectrotemporal control stimulus was perceived by all subjects as a sound with varying intensity over time that did not localise to a point in external space. It was easily 86

122 distinguishable from the moving and fixed external sounds. The stimuli and associated percepts are summarised in Table 4.1 (overleaf) Subjects Twelve subjects (eight males, four females; 11 right-handed, one left-handed) aged 22 to 38 participated. None had any history of a hearing disorder or neurological illness and all had normal structural MRI scans fmri protocol Four stimulus conditions, corresponding to six different percepts, were presented during scanning (Table 4.1): 1) amplitude-modulated broadband noise delivered binaurally in the midline, either at azimuth = 0 degrees (source stationary in front of head) or 180 degrees (source stationary behind head); 2) amplitude-modulated broadband noise delivered binaurally to the side of the head, either at azimuth = 90 degrees (source stationary opposite right ear) or 270 degrees (source stationary opposite left ear); 3) amplitude-modulated noise convolved with dynamically updated HRTFs to simulate a fixed mean positive angular velocity in azimuth (source rotating clockwise around head with constant speed); and 4) the spectrotemporal control stimulus (a sound with varying intensity over time, not localising to a point in external space). Before scanning, subjects were questioned about the stimuli to ensure that the different percepts were reliably experienced by all subjects. Presentation of the moving stimuli produced a percept of sound movement around the head at a distance of approximately 0.5 m. During scanning, subjects were asked to attend to the sound sequences and to fixate a cross piece at the midpoint of the visual axes; there was no active discrimination task. Stimuli were delivered in randomised order using a custom electrostatic system ( at a fixed sound pressure level of 70 db (see Table 2.1 and Figure 2.2). Each stimulus was presented for a period of eight seconds, after which brain activity was estimated by the BOLD response at 2T (Siemens Vision, Erlangen) using gradient EPI in a sparse acquisition protocol (inter-scan interval 12 s; TE 40 ms) (Hall et al., 1999; see Chapter 2, Section , p. 54). Each brain volume comprised 48 transverse 1.8 mm slices covering the whole brain with an 87

123 Table 4.1. Experiment 2: Conditions and percepts Condition external front midline sound binaural amplitude-modulated broadband noise, azimuth = 0 degrees Percept object stationary in front of head external rear midline sound binaural amplitude-modulated broadband noise, azimuth = 180 degrees object stationary behind head external R lateralised sound binaural amplitude-modulated broadband noise, azimuth = 90 degrees object stationary opposite right ear external L lateralised sound binaural amplitude-modulated broadband noise, azimuth = 270 degrees object stationary opposite left ear rotating* sound binaural amplitude-modulated broadband noise, convolved with dynamically updated HRTFs object rotating clockwise around head with constant speed spectrotemporal control sound diotically presented mean of monaural HRTFs sound with varying intensity over time, not localising to a point in external space *mean positive angular velocity = 320 degrees/second

124 inter-slice gap of 1.4 mm and in-plane resolution of 3 x 3 mm. 256 brain volumes were acquired for each subject (32 volumes for each condition, in two sessions) Image analysis Image data were analysed for the entire group and for individual subjects using statistical parametric mapping implemented in SPM99 software ( see Chapter 2, Sections 2.3 and 2.4 and Figure 2.3). Scans were first realigned and spatially normalised (Friston et al., 1995a) to the MNI standard stereotactic space (Evans et al., 1993). Data were spatially smoothed with an isotropic Gaussian kernel of 8 mm FWHM. SPMs were generated by modelling the evoked haemodynamic response for the different stimuli as boxcars convolved with a synthetic haemodynamic response function in the context of the general linear model (Friston et al., 1995b). Group-level inferences concerning BOLD signal changes in the contrasts of interest (sound motion minus stationary sound, sound motion minus the spectrotemporal control condition, and the spectrotemporal control minus stationary sound) were based on a fixed-effects model. The t statistic was estimated for each voxel at a significance threshold of p < 0.05 after correction for multiple comparisons according to Gaussian random field theory (see Chapter 2, Section 2.4.2, p. 63). 4.4 Results In order to compare the neuroanatomical substrates for the spatial analysis of moving sounds and stationary externalised sounds, activation in the rotating sound condition was contrasted with the external midline and external lateralised conditions. In addition, in order to assess the spatial specificity of sound motion processing, activation in the sound motion condition was contrasted with the non-externalised spectrotemporal control condition, and activation in the spectrotemporal control condition was contrasted with the external lateralised condition. SPMs of group data for each of the contrasts of interest are shown in Figure 4.1 (overleaf). SPMs were thresholded at p < 0.05, corrected for multiple comparisons across the whole brain volume. Local maxima of activation for each of the contrasts of interest for the group are listed in Table 4.2 (overleaf). The contrast between rotating and external lateralised stimuli showed bilateral activation of PT, extending into the parieto-temporal operculum (PTO) (Figure 4.1A). Although activation spread into 88

125 Table 4.2. Experiment 2: Local maxima of activation for group Region Side Coordinates (mm) Z score x y z Rotation external lateralised Planum temporale L >8 R >8 Planum temporale Rotation external midline L >8 R >8 Rotation spectrotemporal control Planum temporale L >8 R Inferior parietal lobe R Spectrotemporal control external lateralised Planum temporale L >8 R >8 Data are derived from a fixed-effects analysis of the group (12 subjects). All local maxima are shown for the contrasts between the rotating condition and each of the baseline conditions (stationary externalised in the midline, stationary externalised to the right or left, spectrotemporally matched non-externalised control), and between the spectrotemporal control and the stationary external lateralised condition. A Z score > 4.5 corresponds to p < 0.05 after correction for multiple comparisons across the whole brain volume.

126

127 Figure 4.1. Experiment 2: Statistical parametric maps for group Statistical parametric maps based on group data have been rendered on axial sections of a canonical structural brain template (A C, E), tilted to run parallel to the superior temporal plane, and on a whole brain canonical template (D). Contrasts are indicated above each panel. Activation within planum temporale in E occurs more laterally than in A. Additional activation located superiorly within the right inferior parietal lobe is indicated in D. All voxels significant at the p < 0.05 level (corrected for multiple comparisons) are shown.

128 HG on the right, all local maxima were posterior to HG on both sides (Table 4.2). The contrast between rotating and external midline conditions (Figure 4.1B) demonstrated very similar bilateral involvement of PT and PTO. The contrast between rotating and spectrotemporal control conditions (Figure 4.1C and D) showed more localised activation of postero-medial PT and PTO with an additional discrete focus located more superiorly in right IPL (Figure 4.1D). The contrast between spectrotemporal and external lateralised stimuli (Figure 4.1E) produced bilateral activation in PT that was more restricted and more laterally distributed than the contrast between the rotating and external lateralised conditions (Table 4.2; Figure 4.1A and E). No activation was observed in superior parietal or premotor areas. Individual subject data were not systematically analysed. However, inspection of individual SPMs (thresholded at p < uncorrected) for five subjects in the contrast between rotating and external lateralised conditions revealed consistent bilateral activation of PT, but considerable individual variation in the extent and distribution of activation within PT. 4.5 Discussion A human brain mechanism for processing sound source motion This experiment employed a virtual acoustic space technique, the convolution of broadband noise with a generic HRTF, to produce a stable percept of an external sound source moving in azimuth. Using a conservative criterion for significance that did not take the prior anatomical hypotheses into account, bilateral activation of PT and PTO was demonstrated during sound movement processing. Engagement of this posterior cortical network cannot be attributed simply to spectrotemporal processing of dynamic sound sources since the network was activated by sound motion contrasted against a dynamic spectrotemporal baseline. Although it might be argued that the moving sound stimulus activates more azimuthally tuned neurons than the fixed stimulus, previous functional imaging evidence suggests that any such effect is small (Pavani et al., 2002b; Zatorre et al., 2002b) and unlikely to account for the differential activation between moving and stationary conditions here. It is most plausible that the posterior cortical mechanism demonstrated here specifically encodes the motion of a sound source in auditory space. 89

129 Activation of IPL has been demonstrated in all previous fmri studies of auditory motion that have imaged this region (Table 3.1). Engagement of the parieto-temporal junction here provides further evidence that human IPL is involved in the analysis of discrete sound source movement in space. Activation of superior parietal and prefrontal areas has been inconsistently observed in previous functional imaging studies of sound motion processing (Table 3.1). Based on a similar argument to that advanced for Experiment 1 (see Chapter 3, Section 3.5.1, p. 77), the absence of activation in these areas in the present experiment may reflect the absence of an active auditory attentional or motor task. Previous human functional imaging evidence supports the modulation of superior parietal activity by task requirements during auditory motion or FM detection (Hart et al., 2004). Activation of PT has also been an inconsistent finding in previous studies (Table 3.1): however, this may reflect differences in the stimuli employed. PT was activated in previous fmri studies using vertical and horizontal sound source motion recorded in free field (Pavani et al., 2002b) and interaural AM (Baumgart et al., 1999; Hart et al., 2004), a cue that could be used in the analysis of auditory motion. These previous studies and the present experiment used stimuli with intrinsic spectrotemporal structure: sawtooth FM (Baumgart et al., 1999), sinusoidal FM (Hart et al., 2004), and amplitude-modulated noise (Pavani et al., 2002b and the present experiment). Taken together, these findings suggest that PT is involved in the disambiguation of binaural spectrotemporal sound properties from the intrinsic spectrotemporal properties of the source. Such binaural spectrotemporal processing in PT is necessary to create a neural correlate of the perception of movement in acoustic space. Although the current experiment does not allow localisation of this neural correlate to either PT or PTO, the data show that it occurs at or before PTO. It is proposed that the disambiguation of spatial and non-spatial spectrotemporal cues in PT enables the subsequent formation of a spatial percept at the level of the parieto-temporal junction. Limited human functional imaging evidence (Pavani et al., 2002b) suggests that the encoding of sound motion by this posterior network is multidimensional: no specific neuroanatomical substrates have been identified for the percepts of vertical and horizontal sound source movement. The present study does not support a rightward hemispheric asymmetry of sound movement processing in the human posterior superior temporal lobe, as suggested by Baumgart et al. (1999). Deficits in the detection of sound motion cues have been described in association with both right (Griffiths et al., 1996) and left (Clarke et al., 90

130 2000) posterior temporo-parietal lesions in humans, implying that both hemispheres normally participate in the analysis of sound motion. However, such deficits are rarely clinically significant, and hemispheric lateralisation might emerge at subsequent processing stages that mediate cross-modal integration and orienting behavioural responses to moving sound sources A common spatial analysis pathway? The direct comparison between moving and static externalised conditions here demonstrates greater activation of the network by moving than by stationary sounds. Experiments 1 and 2 together demonstrate that human postero-medial PT is engaged in the analysis of the spatial attributes of both static and moving sound sources. The pathways of information transfer within brain networks such as those identified here cannot be established using any single modality. Such pathways can only be elucidated through an improved understanding of the connectivities between PAC, PT and higher order cortices and the spatio-temporal sequence of activation in these different cortical areas (Tardif & Clarke, 2001; Goncalves et al., 2001; Anourova et al., 2001; Alain et al., 2001). However, several lines of evidence support the existence of a true posteriorly directed processing stream in the human auditory brain that extends from PT through PTO into IPL. A potential anatomical substrate for such a stream has been demonstrated in cytoarchitectonic studies of human auditory cortical areas, which show that auditory parakoniocortex extends contiguously from the STP into PTO (Galaburda & Sanides, 1980). Moreover, MEG (Yvert et al., 2001) and intra-operative cortical stimulation (Howard et al., 2000) latency data indicate sequential activation of primary followed by more posteriorly sited auditory cortices. Based on the present data and previous human functional imaging (Alain et al., 2001; Maeder et al. 2001), electrophysiological (Alain et al., 2001; Anourova et al., 2001) and lesion (Clarke et al., 2000) evidence, it appears likely that this posteriorly directed pathway in human auditory cortex is homologous to the posterior/dorsal processing stream for auditory spatial information in the macaque (Rauschecker, 1998; Kaas & Hackett, 1999, 2000; Kaas et al., 1999). It remains possible that the cortical substrates for sound localisation and sound motion processing are not identical. The present experiment does not establish which properties of moving sounds (for example, angular displacement or velocity) are critical for human 91

131 sound motion perception. In principle, the perception of sound motion might depend on the analysis of dynamic properties such as velocity or the serial snapshot analysis of changing spatial location (Middlebrooks & Green, 1991; Ducommun et al., 2002). This issue is further complicated by the possibility that intrinsically greater computational resources are required to localise moving versus stationary sound sources (Ducommun et al., 2002): indeed, increased computational load may account for the differential activation of posterior auditory cortical areas in the contrast between moving and stationary conditions in the present experiment. Animal work has supplied evidence both for common (Ahissar et al., 1992) and for distinct (Toronchuk et al., 1992) electrophysiological mechanisms that process sound motion and sound position. In humans, psychophysical studies do not conclusively support a mechanism that analyses dynamic sound source properties independently of changing spatial location (Grantham, 1986; Perrott & Marlborough, 1989; Middlebrooks & Green, 1991), however limited electrophysiological evidence suggests that partially segregated fronto-parietal networks may process sound motion and location (Ducommun et al., 2002). By analogy with the MT/MST complex in the primate visual system (Tootell et al., 1995), additional cortical areas beyond the temporo-parietal pathway may also be involved in the processing of auditory motion. While the posterior temporal plane areas demonstrated here analyse spatially determined spectrotemporal patterns that form the basis for the perception of stationary and moving sound sources, putative additional cortical areas might be specialised for the perceptual processing of moving sounds including the perceptual integration of motion in acoustic and other sensory modalities (Howard et al., 1996b). Experiments 1 and 2 here were designed to identify neuroanatomical substrates for the processing of sound location and motion in space. However, it is possible that the specific perceptual correlates that distinguish static and moving sounds are established by specific patterns of activation or connectivity between PT, PTO and IPL, rather than by anatomically discrete processing mechanisms The planum temporale as a computational hub Experiment 1 and the present experiment together demonstrate that human PT plays a specific role in the computation of sound source location and motion in external space. The computation of sound source location might be achieved a priori by segregating the intrinsic spectrotemporal structure of the sound from spectrotemporal patterns determined 92

132 by its position in space (Wightman & Kistler, 1998; Hofman et al., 1998). Along with previous functional imaging studies that have demonstrated activation of PT in the spatial analysis of sounds with intrinsic spectrotemporal structure (Baumgart et al., 1999; Pavani et al., 2002b; Zatorre et al., 20002b; Hunter et al., 2003; Hart et al., 2004), Experiments 1 and 2 suggest that this disambiguation of spectrotemporal patterns is achieved in human PT. In each experiment, spatially determined spectrotemporal patterns could be segregated using binaural and monaural information: binaural spatial information is contained in the difference between spectrotemporal patterns arising at the two ears, while monaural spatial information is contained in the pinna filter function (the HRTF). Experiment 2 suggests that the analysis of the sound movement may be more computationally demanding than the analysis of static location (as reflected in the additional activation of the posterior cortical mechanism by moving compared to static externalised sound sources). This would follow if the mechanisms engaged in spectrotemporal segregation must be dynamically updated to track the spatial trajectory of a moving sound source. This experiment does not assess the relative contributions made by binaural and monaural cues to the analysis of sound source trajectories. Experiments 1 and 2 further suggest distinct neuroanatomical correlates of this segregation of spatial and non-spatial sound source information within PT. Both experiments implicate postero-medial PT in auditory spatial analysis and antero-lateral PT in non-spatial spectrotemporal analysis (Figures 3.2 and 4.1). Such an organisational scheme is broadly supported by previous human functional imaging evidence (see Table 3.1) implicating PT in the spatial (Baumgart et al., 1999; Pavani et al., 2002b; Hunter et al., 2003; Hart et al., 2004) and non-spatial (Binder et al., 1996; Giraud et al., 2000; Thivard et al., 2000; Hall et al., 2002) analysis of many categories of complex sounds. Data for activation maxima within human PT derived from functional imaging studies are displayed in Table 4.3 and Figure 4.2 (overleaf). The antero-lateral and postero-medial subdivisions of PT proposed here are defined with respect to one another, rather than any mapping into an absolute coordinate space: the local maxima of activation in studies of auditory spatial analysis cluster postero-medially to the maxima for non-spatial spectrotemporal pattern processing. The definition of absolute boundaries for these clusters would require a more complete understanding of the cytoarchitecture of the human posterior temporal plane than is currently the case (Rivier & Clarke, 1997; Tardif & Clarke, 2001; Wallace et al., 2002b). Moreover, it is likely that such boundaries would 93

133 Table 4.3. Human planum temporale in spectrotemporal processing Study Principal contrast Side Zatorre et al., 2002b Pavani et al., 2002b Hart et al., 2004 Experiment 1 Experiment 2 Covariation with spatial distribution (free field) of simultaneous noises Conjunction of azimuthal and vertical sound motion minus stationary sound (free field) Moving (interaural AM) minus stationary FM tones Changing minus fixed position of IRN burst Sound source rotation minus stationary sound Spatial analysis PT peak (mm) x y z Regions co-activated R Bilat PTO / IPL L Bilat PTO, IPL, bifrontal areas, bilat cerebellum R L R L R Bilat PTO, L IPL Griffiths et al., 1999a Giraud et al., 2000 Thivard et al., 2000 Hall et al., 2002 Duration sequences minus silence Simple sound patterns L R Amplitude-modulated minus L unmodulated noise R Spectral motion versus L stationary stimuli R Harmonic complex minus L pure tones R Cerebellum, bilat HG, bilat STG, bilat IPL, bifrontal areas Bilat HG, L STS / STG, L IPL Bilat STG R HG, bilat STG

134 Experiment 2 Frequency-modulated minus unmodulated tones Spectrotemporal minus fixed external sound L R L R Bilat HG / STG Binder et al., 1996 Pitch sequences Tone sequences minus words L (active task) L Griffiths et al., 1999a Binder et al., 2000 Pitch sequences minus silence Tone sequences minus noise Experiment 1 Changing minus fixed pitch of IRN burst L R R L R L Environmental sounds Cerebellum, bilat HG, bilat STG, bilat IPL, bifrontal areas Bilat STG, R STS Bilat HG, bilat STG Engelien et al., 1995 Passive listening minus rest L Bilat HG, R inf frontal, R insula, R IPL Thierry et al., 2003 Semantic decision on R R post STG environmental sounds over words R Belin et al., 2000 Voices Vocal minus non-vocal L sounds R Bilat STS, R MTG

135 Study Principal contrast Side PT peak (mm) Regions co-activated Music Zatorre et al., 1994 Melodies minus noise R R STG, R fusiform gyrus Zatorre et al., 1996 Listening to familiar songs L Bilat HG, bilat STG, minus visual baseline bifrontal areas, L IPL, R SMA Perry et al., 1999 Halpern & Zatorre, 1999 Tervaniemi et al., 2000 Patterson et al., 2002 Maintenance of pitch while singing minus complex pitch perception Musical imagery (imagining continuation of a tune minus listening) Deviant minus standard chords (preattentive) Diatonic melodies minus fixed pitch R R R HG, bifrontal areas, bilat insula, bilat IPL, R Bifrontal areas, L SMA R R STG L Bilat STG, PP Speech and speech-like sounds bilat occipital areas, cerebellum Zatorre et al., 1992 Speech minus noise L Bilat STG, L MTG, L inf frontal Vouloumanos et al., 2001 Speech minus complex nonspeech L Bilat MTG / STG, R inf frontal Speech minus tones L Bilat MTG / STG, R insula Complex non-speech minus L Bilat STG, R MTG tones Jäncke et al., 2002 Consonant-vowel syllables L R STG, R STS minus vowels R Unvoiced minus voiced consonants L L HG

136 McGuire et al., 1996 Verbal self-monitoring (reading aloud with distorted feedback minus reading aloud) L L STS, R STG, L insula Dichotic listening Hashimoto et al., 2000 Listening to dichotic minus diotic speech L Bilateral STG / STS, bilat inf frontal, R insula Active listening Hall et al., 2000 Active target detection minus passive listening L L STS, L IPL, L frontal areas, L thalamus, L insula Cross-modal processing Howard et al., 1996b Calvert et al., 1997 Coherent visual motion minus stationary stimulus Optical flow minus randomised motion Lip-reading minus watching meaningless facial movements L R L R Bilat V5, V3 L post temporal lobe, R IPL All local planum temporale (PT) maxima fall within 95% probability anatomical boundaries for human PT of Westbury et al. (1999). Studies have been selected to illustrate the variety of types of pattern processing in PT and the different cortical areas co-activated in each case. Activation peaks are plotted in Figure 4.2. Key: AM, amplitude modulation; bilat, bilateral; FM, frequency modulation; inf, inferior; HG, Heschl s gyrus; IPL, inferior parietal lobe; IRN, iterated rippled noise; L, left; MTG, middle temporal gyrus; post, posterior; PP, planum polare; PTO, parieto-temporal operculum; R, right; STG, superior temporal gyrus; STS, superior temporal sulcus; SMA, supplementary motor area.

137

138 Figure 4.2. Human planum temporale as a computational hub A, Axial section through the superior temporal plane of the human brain. The planum temporale (PT) lies posterior to Heschl s gyrus (HG), the site of the primary auditory cortex, and is contiguous posteriorly with the parieto-temporal operculum (PTO). 95% probability maps for the boundaries of left and right PT in humans (derived from Westbury et al., 1999) are outlined (red). B, Insets centred on left and right PT showing functional activation peaks within PT associated with different types of complex sound processing (see Table 4.3). Symbols are coded below. The functional relationships between PT and higher cortical areas that are co-activated (see Table 4.3) in processing simple sound patterns (green), music (yellow), speech (red) and auditory space (blue) are indicated schematically. According to this model, PT performs the initial segregation of spectrotemporal information into spatial and non-spatial auditory properties. Arrows indicate postulated flow of information from PT to higher cortical areas; in many cases, however, exchange of information is likely to be reciprocal. In this scheme, the computational mechanism in PT uses information about sound properties (including stored templates) derived from higher cortical areas linked to PT, and the output of PT is used to update stored information in these same areas. Key: STG, lateral superior temporal gyrus; STS, superior temporal sulcus; MTG, middle temporal gyrus; PTO, parieto-temporal operculum; IPL, inferior parietal lobe.

139 show considerable variability between individuals (Westbury et al., 1999; Morosan et al., 2001; Rademacher et al., 2001). As further information concerning structure-function relationships within human PT becomes available, it is likely that more robust and more detailed definitions of the functional subdivisions will be established. Such definitions might for example be based on probability gradients similar to those proposed recently for PAC (Rademacher et al., 2001). The present experiments do not completely define the functional specificities of antero-lateral and postero-medial PT subdivisions. This is unlikely to be an exclusively spatial non-spatial distinction. For example, posteromedial PT is activated by low frequency FM sweeps (Schönwiesner et al., 2002) and during speech production (Wise et al., 2001). Experiments 1 and 2 define the spatial and non-spatial specificities of human posterior STP areas in relation to each other. However, alternative functional parcellations based on other forms of spectrotemporal pattern segregation are possible (one such alternative segregation based on non-spatial properties is examined in Experiment 3, described in the following chapter). Taking these caveats into account, the proposed scheme nevertheless accords well with emerging anatomical and electrophysiological data on the medio-lateral functional parcellation of the macaque caudal auditory belt (see Chapter 1, Section 1.2.1, p. 4 and Figures 1.1 and 1.2). Inspection of Figure 4.2 indicates that the anatomical segregation of spatial and non-spatial auditory processing within human PT is not absolute. This is consistent with macaque evidence that a certain subpopulation of neurons in area CL responds both to the spatial location of complex sounds and to specific call sounds (Tian et al., 2001). Taken together, these observations suggest that non-primary auditory cortex may have a similar functional organisation in humans and non-human primates: there is relative (rather than absolute) selectivity of medial belt areas for processing spatial information, and lateral belt areas for processing non-spatial (object) information. However, the electrophysiological properties of the medial portion of the posterior STP are technically difficult to study directly in both humans and non-human primates. The true extent of functional or anatomical homology between macaque CM and CL, human Te2 and Te3 (Morosan et al., 2001) and the postero-medial and antero-lateral PT functional subregions proposed here therefore remains uncertain. Both Experiments 1 and 2 show that the proposed PT subregions are involved in distinct functional and anatomical networks that also include distinct higher cortical areas: 94

140 postero-medial PT is part of a network that also includes IPL, superior parietal and premotor areas, while antero-lateral PT is part of an antero-posteriorly distributed network of areas including lateral HG, PP and STG. These observations are consistent with a model of human PT function in which spectrotemporal information is segregated into spatial and non-spatial patterns and gated into distinct cortical pathways by PT. In this model, PT represents a generic computational hub that accesses distinct cortical mechanisms for sound object identification and localisation and directs further processing in higher cortical regions. The mechanisms involved are instantiated in the distinct cortical networks identified in Experiments 1 and 2 and in previous human functional imaging studies (Tables 3.1 and 4.3). Spatial properties are extracted and analysed in the posterior network that links postero-medial PT with IPL, superior parietal and prefrontal areas, while non-spatial (object) properties are extracted and analysed in the distributed temporal lobe network that links antero-lateral PT with lateral HG, PP, STG and STS. The involvement of human PT in the integration of cross-modal and multimodal information (Howard et al., 1996b; Calvert et al., 1997; Downar et al., 2000; Price et al., 2003) provides further evidence that this region is sited at the hub of a distributed network of cortico-cortical connections. Various classes of spatial and non-spatial spectrotemporal patterns (for example, the spatially determined HRTF, human voices, environmental sounds, and speech phonemes) might establish stored templates based on past experience of the acoustic world (Wightman & Kistler, 1998; Hofman et al., 1998). An important aspect of any generic role played by PT in the segregation of spectrotemporal patterns might involve the comparison of incoming (novel) patterns with such stored spectrotemporal templates. Such a template-matching mechanism might underlie sensory memory in human auditory cortex based on auditory experience over short timescales (Näätänen et al., 2001), however analogous mechanisms might operate over much longer timescales. These template-matching algorithms could involve both feed-forward connections conveying information about mismatch between incoming patterns and templates, and feed-back connections from higher areas that provide information about updated templates to the template-matching algorithm. Generative models of this type, incorporating reciprocal bottom-up and top-down connections and the capacity for plasticity based on experience, are likely to be widely represented among sensory systems (Friston & Price, 2001). The model is represented schematically in Figure

141 Experiments 1 and 2 do not establish the nature of the computational algorithms that might support the proposed segregation of spectrotemporal information. However, these algorithms might share certain formal features with independent component analysis as modelled in artificial neural networks (Bell & Sejnowski, 1995; Attias & Schreiner, 1998; Stone, 2002). A convolutive mixture (incoming spectrotemporal information) might be deconvolved by PT using stored information about signal characteristics (such as sound object properties) and mixing filter characteristics (such as the HRTF). This computational model suggests that human PT plays a generic role in auditory scene analysis based on the segregation of spectrotemporal patterns that determine spatial location and other sound source properties. This formulation predicts that PT participates in sound source segregation based on non-spatial as well as spatial source properties, and that source information extracted by PT forms a basis for the abstraction of sound object features by higher cortical areas. These predictions are tested in the experiments described in the following two chapters. 96

142 Chapter 5. ANALYSIS OF SOURCE-DEPENDENT AND SOURCE-INDEPENDENT PITCH PROPERTIES Summary Auditory What information can be broadly categorised as information about particular sound sources and information patterns that do not depend on any particular source. These source-dependent and source-independent types of What information are exemplified in the two pitch dimensions recognised by musicians. On the keyboard, these dimensions are illustrated by the octave and the cycle of notes within the octave; in perception, the two dimensions are referred to as pitch height and pitch chroma respectively. Pitch height provides a basis for segregation of notes into streams to separate sound sources (such as different voices or musical instruments). In contrast, pitch chroma provides a basis for presenting spectrotemporal patterns (melodies) that do not depend on the particular sound source. In this fmri experiment (Experiment 3), pitch height and pitch chroma were manipulated independently in order to identify the substrates for these two types of pitch change in human auditory cortex. The results show that height change is specifically represented posterior to primary auditory cortex, while chroma change is specifically represented anterior to primary auditory cortex. It is proposed that distinct posterior and anterior brain mechanisms process the pitch dimensions of height and chroma. These mechanisms may support different aspects of auditory scene analysis: the segregation of sound sources depends on posterior auditory cortical areas, while the tracking of acoustic information streams associated with individual sound sources occurs in anterior areas. 97

143 5.1 Background Physicists define pitch as the perceptual correlate of acoustic frequency: a single physical dimension along which musical notes can be ordered from low to high (American Standards Association, 1960). However, the perception of pitch is both complex and controversial. Human listeners perceive relations between frequencies that are not captured by a simple linear ordering (Krumhansl, 1990; Deutsch, 1999). This is illustrated by the fundamental perceptual principles of interval equivalence and octave equivalence. The perceived equality of intervals between tones with fundamental frequencies in the same ratio (interval equivalence: Deutsch, 1999) forms the basis for the traditional Western musical scale, in which musical pitch is a logarithmic function of frequency. However, tones separated by octaves have a perceptual similarity (octave equivalence) that is well established in humans, including young infants (Demany & Armand, 1984), and may generalise to other mammalian species (Deutsch, 1999; Wright et al., 2000). This octave effect demonstrates the inadequacy of unidimensional models of pitch perception. The logarithmic pitch ordering of the musical scale defines one perceptual pitch dimension, pitch height ; the circular ordering of notes within each octave defines another perceptual pitch dimension, pitch chroma. Various alternative geometric models have been proposed in order to represent the relation between different perceptual dimensions of pitch (Shepard, 1982). The simplest of these is the helix model of musical pitch, in which the notes of the scale lie along an ascending spiral (Figure 5.1A, overleaf) with a circular dimension of pitch chroma and a vertical dimension of pitch height (Shepard, 1982; Krumhansl, 1990; Deutsch, 1999). The range of pitches produced by a sound source is a basic attribute that can be used to segregate that source from other sources in the acoustic environment. In addition, pitch is a basic building block for acoustic patterns such as musical melodies and prosodic contours that convey information that does not depend on a particular pitch range (a particular sound source). These different types of pitch information may be conveyed by the distinct perceptual dimensions of height and chroma, respectively. The different functions of the two pitch dimensions are illustrated when the same melody is sung by a male or a female voice, or played by a violin or a cello. The vocal cords of women vibrate faster than those of men, and the strings of a violin vibrate faster than those of a cello. These physical differences correspond to differences in pitch height that contribute 98

144

145 Figure 5.1. Experiment 3: Basis for stimulus manipulations A, The pitch helix. The musical scale is wrapped around so that each circuit (red) is an octave. The equivalent change in pitch height with fixed chroma is shown (blue). B, Examples of sounds with changing pitch height. Each of these harmonic complexes (h1, h2, h3) has a flat spectral envelope in the frequency band 0-4 khz. In h1 (top), all harmonics of the fundamental, f0, have equal amplitude; in h2 (middle), the odd harmonics are attenuated by 10 db, producing a large increase in pitch height without changing pitch chroma; in h3 (bottom), the odd harmonics are completely attenuated, producing a one octave rise in pitch height with the same chroma (2f0).

146 to our perception that women s voices are higher than men s and violins are higher than cellos. This distinction is not based on pitch chroma, since both voices and instruments can produce the full range of chroma; a given chroma pattern, or melody, might be produced by any of these different sources. It is an average pitch height difference that is more properly associated with the perception that one source is higher than another. Pitch height is a source-dependent property that is used in the segregation of sound sources. In contrast, pitch chroma is a source-independent property that is used in tracking the information conveyed by different sound sources. It has been shown in previous human functional imaging studies that pitch strength or salience (as indexed by increasing temporal regularity at the scale of milliseconds in delay-and-add noise) is represented in the ascending auditory pathways up to and including lateral HG (Griffiths et al., 1998a, 2001), whereas pitch sequences embodied in musical melodies (Zatorre et al., 1994; Griffiths et al., 1998a; Patterson et al., 2002) and speech information (Scott et al., 2000; Mitchell et al., 2003; Scott & Johnsrude, 2003) engage cortical areas beyond HG. The dimensions of the pitch helix describe relations between pitch values. Accordingly, it is likely a priori that distinct brain substrates for these dimensions involve areas beyond HG. However, the chroma and height dimensions of pitch have not been separated in previous functional imaging studies. Many of the pitch patterns used in these previous studies (such as musical melodies or prosodic contours) could be considered as information streams based on sequences of chroma values. Such patterns engage an anteriorly directed network of cortical areas extending from antero-lateral PT and lateral HG into PP, STG and STS. Such studies have not addressed the existence of a specific brain substrate for pitch height analysis using a paradigm where pitch height could be manipulated independently of chroma. However, both previous functional imaging evidence (Zatorre et al., 2002b) and the findings of Experiments 1 (Chapter 3) and 2 (Chapter 4) here suggest that human PT is involved in the segregation of sound sources using spatial information (see Chapter 4, Table 4.3 and Figure 4.2). It is therefore logical to ask whether PT may also be involved in the computation of pitch height, a non-spatial sound source property that could be used in the decomposition of auditory scenes (Bregman, 1990). 99

147 5.2 Experimental hypotheses 1. Distinct human brain mechanisms analyse the source-dependent property of pitch height and the source-independent property of pitch chroma 2. These mechanisms are instantiated in distinct cortical networks located posterior and anterior to primary auditory cortex The present fmri experiment (Experiment 3) was designed to determine whether the helical model of pitch perception is reflected in the organisation of the human brain. It was predicted that specific brain mechanisms are engaged in the analysis of pitch height and pitch chroma. Based on the findings of Experiments 1 (Chapter 2) and 2 (Chapter 3) here and previous functional imaging evidence (Zatorre et al., 2002b) implicating human PT in the spatial segregation of sound sources, it was hypothesised that pitch height changes (changes in sound source identity) are specifically processed in cortical regions posterior to PAC. Based on previous functional imaging studies of pitch sequence processing (Zatorre et al., 1994; Griffiths et al., 1998a; Patterson et al., 2002; Mitchell et al., 2003), it was hypothesised that pitch chroma changes (source-independent auditory information streams) are specifically processed in cortical regions anterior to PAC. These predictions were tested using a factorial experimental design in which the two pitch dimensions could be manipulated independently. 5.3 Methods Bases for the manipulation of pitch dimensions The stimuli were harmonic complexes in which height and chroma could be varied continuously and independently while the total energy and spectral region remained fixed (Figure 5.1B; Patterson, 1990; Patterson et al., 1993; see Appendix I). The standard stimulus with all harmonics of the fundamental frequency, f0 (Figure 5.1B, top) had the pitch f0, by definition. The pitch chroma of the stimuli was altered by varying f0 in semitone steps as in the chromatic musical scale; this corresponds to motion around the pitch helix (Figure 5.1A, red line). The pitch height was varied independently of chroma by reducing the amplitude of all odd harmonics in the complex (Figure 5.1B, middle). If the odd harmonics of a harmonic complex are attenuated between 0 and approximately 40 db relative to baseline intensity, the new tone has the same pitch chroma, f0, but is 100

148 perceived as higher in pitch; this change occurs along the pitch height dimension (Figure 5.1A, blue line). When the odd harmonics are attenuated completely, pitch height reaches the octave, 2f0 (Figure 5.1B, bottom). Repeating the attenuation process over successive octaves produces continuous pitch height changes without concomitant chroma changes. Using this procedure, sequences of notes were synthesised in which chroma and/or height could be independently varied between notes in the sequence Relationship between pitch height, tone height and timbre Complex perceptual relations exist between the tone height, pitch height and timbre of harmonic sounds. Tone height differences can be used in the perceptual segregation of harmonic sounds with the same chroma value. Tone height can be manipulated by changing the spectral envelope of the sound or its spectral fine structure (the intensity or the phase of alternate harmonics) (Patterson, 1990). Changes in spectral envelope convey size information in natural sounds (Irino & Patterson, 2002): women have shorter vocal tracts than men and, as a result, their formant frequencies are higher; violins are smaller than cellos and, as a result, their resonant peaks occur at higher frequencies. Such changes in spectral envelope represent changes in tone height that are related to a dimension of timbre perception, where timbre is operationally defined as that property which distinguishes two sounds of identical pitch, intensity, duration and location (American Standards Association, 1960). It follows that this dimension is separate from the dimensions of pitch, and the manipulations in this experiment do not involve such tone height changes based on spectral envelope variation. Tone height can also be altered by changes in spectral fine structure, which are related to the size information in natural sounds. These changes can be produced by attenuation of the odd harmonics (as in this experiment), or a fixed shift in the phase of either the odd or the even harmonics that does not involve an alteration in the power spectrum of the stimulus (Patterson, 1990). Such tone height changes specifically represent changes in pitch height, corresponding to an alteration in the perceived pitch of the sound source. This experiment was designed to establish neuroanatomical correlates of the perception of pitch height, and the location of the auditory cortical mechanism for the analysis of pitch height rather than tone height in its more general sense. 101

149 Manipulations of the fine spectral structure of sounds can, however, also affect the perceived timbre. For example, the characteristic timbre of a clarinet in part reflects an attenuation of the even harmonics relative to the odd harmonics, imparting a hollowness to the sound of the instrument. In the current experiment, odd harmonics were attenuated relative to the even harmonics: this manipulation has less effect on the perceived timbre of the sound. Manipulations of the fine spectrotemporal structure of a harmonic sound may therefore affect the perceived pitch or the perceived timbre. However, it is the effect on pitch height that is the focus of the current experiment, and the perceptual ordering demonstrated in the next section is most parsimoniously explained in terms of a pitch dimension Psychophysical effect of pitch height manipulation The effect on pitch height of attenuating the odd components of a harmonic series was originally investigated in scaling experiments (Patterson, 1990; Patterson et al., 1993). Here a new pitch height discrimination experiment was performed to confirm that the stimulus manipulations used in the fmri experiment could be ordered continuously along the dimension of pitch height and to determine the resolution of discrimination along this dimension. In a two-interval, two-alternative, forced-choice task, three normal subjects were presented with a standard stimulus and a test stimulus in which the odd harmonics were attenuated more than the standard: the task was to choose the stimulus with the higher pitch height. Notes all had the same pitch chroma (80 Hz), duration (200 ms), frequency region (f0 to 4 khz) and energy (sound pressure level 70 db). Three different standard stimuli were used for each listener where the odd harmonics had fixed attenuation of 0, 6 or 12 db. The results are presented in Figure 5.2 (overleaf). The lefthand psychometric functions show how discrimination increased with attenuation for the standard with 0 db attenuation of the odd harmonics. All three listeners had thresholds of less than 1 db of attenuation, and achieved near-ceiling performance at just over 2 db of attenuation. The central and right-hand psychometric functions show that pitch height discrimination remained excellent and essentially uniform when the standard stimulus had fixed attenuation of odd harmonics by 6 or 12 db. The experiment was performed without feedback and required essentially no training, indicating that the pitch height cue is stable and the direction of higher is consistent across listeners. 102

150

151 Figure 5.2. Experiment 3: Psychometric functions for pitch height The psychometric function shows the dependence of the proportion of correct responses on the size of the change in pitch height. Psychometric functions for three subjects were derived from a two-interval, two-alternative, forced-choice experiment in which subjects were asked to detect which note is higher? The ordinate shows the proportion of correct subject responses, where 50% is equivalent to performance at chance; the abscissa shows the attenuation of odd harmonics in the test stimulus relative to baseline intensity. The fundamental frequency, f0, was fixed at 80 Hz throughout the experiment (see Methods, Section 5.3.1, p. 100). The psychometric functions are based on change in attenuation of odd harmonics relative to standards in which the odd harmonics have fixed attenuation of 0 db (left), 6 db (centre) and 12 db (right). Each data point is based on at least 60 trials. Weibull functions were fitted using maximum likelihood estimation implemented in Matlab6 (fitting software at The 75% threshold is defined as the attenuation value at which subjects achieve a score of 75% correct (halfway between chance and ceiling performance): the 75% thresholds and 95% confidence intervals for each threshold shown (+) were derived using bootstrapping with 999 simulations (Wichmann & Hill, 2001a,b). In order to obtain 95% confidence intervals, the bootstrapping procedure employed a Monte Carlo method for estimating the variability of the fitted psychometric function. All subjects heard an increase in pitch height from the standard when the odd harmonics were attenuated by approximately 1 db in the test stimulus, and the pitch height threshold is uniform along the dimension.

152 5.3.4 Stimulus details In the fmri experiment, sounds were either pitch-producing harmonic complexes or broadband Gaussian noise. All sounds were created digitally at a sampling rate of 44.1 khz and 16 bit resolution. Total energy and passband (0 to 4 khz) were fixed for all stimuli, and the harmonic complexes were in cosine phase with components ranging from f0 to 4 khz (Figure 5.1B). Sounds were combined in sequences where the duration of each individual element was fixed at 200 ms, and each sequence contained 40 elements (total sequence duration eight seconds). Pitch chroma was varied randomly across one octave ( Hz) of the chromatic musical scale by varying f0 from note to note in the sequence, in semitone steps. Pitch height was also varied randomly across approximately one octave, by attenuating the amplitude of all odd harmonics (Figure 1b, middle row) from note to note, in 2 db steps. The note with the lowest pitch height had fundamental f0 with 10 db attenuation of the odd harmonics of f0, and the note with the highest pitch height had fundamental 2f0 with 10 db attenuation of the odd harmonics of 2f0. The intervening pitch height steps had 12, 14, 16, 18 and 20 db attenuation of the odd harmonics of f0, and 0, 2, 4, 6 and 8 db attenuation of the odd harmonics of 2f0. In the stimuli with changing pitch height, chroma values were lowered by half an octave. The total range of subjective pitch change across a sequence was therefore approximately one octave with either chroma variation or height variation, and the maximum overall pitch range was one and a half octaves if both chroma and height variation occurred together Subjects Ten subjects (six males, four females; nine right-handed, one left-handed) aged 21 to 38 participated in the fmri experiment. None had any history of a hearing disorder or neurological illness and all had normal structural MRI scans fmri protocol Six stimulus conditions, each corresponding to a different type of sound sequence and a distinct percept, were presented during scanning: 1) fixed pitch chroma, fixed pitch height (a single sound source with fixed pitch); 2) changing pitch chroma, fixed pitch height (a single sound source with changing pitch); 3) changing pitch height, fixed pitch 103

153 chroma (a disrupted auditory stream); 4) changing pitch chroma, changing pitch height (a disrupted auditory stream with changing pitch); 5) broadband noise without pitch (a series of noise bursts with the same cadence as the pitch sequences); 6) silence. The disrupted auditory stream perceived in the conditions with changing pitch height was compared by some subjects to the perception of distinct instrumental lines in polyphonic music. During scanning, subjects were asked to pay attention to the sound sequences. To help maintain alertness, they were required to make a single button press at the end of each broadband noise sequence using a button box positioned beneath the right hand, and to fixate a cross in the middle of the visual field. There was no active auditory discrimination task. Stimuli were delivered in randomised order using a custom electrostatic system ( at a fixed sound pressure level of 70 db (see Table 2.1 and Figure 2.2). Following each sound sequence, brain activity was estimated by the BOLD response at 2 T (Siemens Vision, Erlangen) using gradient EPI in a sparse acquisition protocol (inter-slice interval 12 s; TE 40 ms) (Hall et al., 1999; see Chapter 2, Section , p. 54). Each brain volume comprised 48 transverse 1.8 mm slices covering the whole brain with an inter-slice gap of 1.4 mm and in-plane resolution of 3 x 3 mm. 192 brain volumes were acquired for each subject (32 volumes for each condition, in two sessions) Behavioural data on imaged subjects Following scanning, all subjects underwent two-alternative, forced-choice psychophysics to determine thresholds for detection of height and chroma changes in the sound sequences used during image acquisition. During scanning, the minimum pitch height and pitch chroma steps were 2 db and one semitone, respectively. All subjects in the fmri experiment could readily detect the changes in pitch height and pitch chroma and distinguish height sequences and chroma sequences. All subjects had a threshold for detection of pitch height change less than 2 db and a threshold for pitch chroma change less than one semitone. 104

154 5.3.8 Image analysis Image data were analysed for the entire group and for each individual subject using statistical parametric mapping implemented in SPM99 software ( see Chapter 2, Sections 2.3 and 2.4 and Figure 2.3). Scans were first realigned and spatially normalised (Friston et al., 1995a) to the MNI standard stereotactic space (Evans et al., 1993). Data were spatially smoothed with an isotropic Gaussian kernel of 8 mm FWHM. SPMs were generated by modelling the evoked haemodynamic response for the different stimuli as boxcars convolved with a synthetic haemodynamic response function in the context of the general linear model (Friston et al., 1995b). Group-level inferences concerning BOLD signal changes in the contrasts of interest (changing pitch height minus fixed pitch height and changing pitch chroma minus fixed pitch chroma) were based on a fixed-effects model. The t statistic was estimated for each voxel at a significance threshold of p < 0.05 after correction for multiple comparisons across the whole brain volume according to Gaussian random field theory (see Chapter 2, Section 2.4.2, p. 63). Brain areas activated specifically in the pitch height contrast and the pitch chroma contrast, and brain areas activated in common in both contrasts were identified using masking procedures: these procedures applied logical operators across all voxels in the SPMs corresponding to each contrast. Exclusive masking was used to identify voxels activated by one contrast but not the other contrast, and inclusive masking was used to identify voxels activated in both contrasts. Individual subject data were analysed in order to assess the anatomical variability of pitch height and pitch chroma processing within the group. Individual subject data were analysed using the same statistical model and assessed for each voxel at a significance threshold of p < 0.05 after small volume correction taking the a priori anatomical hypotheses into account. For the contrast between changing and fixed height conditions, anatomical small volumes were based on 95% probability maps for the left and right human PT (Westbury et al., 1999). For the contrast between changing and fixed chroma conditions, anatomical small volumes, comprising left and right lateral HG and PP, were derived from the group mean normalised structural MRI brain volume. 105

155 5.4 Results Group data In the group analysis, significant activation was demonstrated in each of the contrasts of interest at the p < 0.05 voxel level of significance after correction for multiple comparisons across the entire brain volume. The contrast between broadband noise and silence produced extensive bilateral superior temporal activation, including medial HG and parts of PT (Figure 5.3A, green; overleaf) that was largely symmetric. The contrast between pitch conditions and noise produced more restricted bilateral activation in the lateral part of HG and PT and extending into PP (Figure 5.3A, lilac). The contrasts between changing and fixed pitch chroma (Figure 5.3B, red) and between changing and fixed pitch height (Figure 5.3C, blue) produced common bilateral activation in lateral HG and antero-lateral PT. Masking (see Methods, Section 5.3.8, p. 105) was then applied to exclude voxels activated by change both in pitch chroma and pitch height (Figure 5.3D). Brain areas activated only by changes in pitch height were distinct from those activated only by changes in pitch chroma. Pitch height change (Figure 5.3D, blue) produced additional activation extending posteriorly in PT, whereas pitch chroma change (Figure 5.3D, red) produced additional activation extending anteriorly from HG into PP. Activation was bilateral and slightly asymmetric, and the activity in the right hemisphere was slightly more anterior than that in the left hemisphere. Local maxima of activation in the STP for the group are listed in Table 5.1 (overleaf). The relative magnitude of the mean size of effect (change in BOLD signal) in each of the contrasts of interest (Figure 5.3D, right) shows the opposite pattern for pitch height and pitch chroma processing in posterior and anterior auditory areas. A significant one-way interaction between pitch height and pitch chroma processing was detected at the p < 0.05 voxel significance level in lateral HG and antero-lateral PT bilaterally. In these areas, change in either pitch dimension produced significantly greater activation where the other dimension was fixed rather than changing Individual data Individual subject analyses were carried out to determine whether the anatomical distinction between the regions specific for pitch height and chroma processing in the group was also evident in the individual data. Figure 5.4 (follows page 107) presents the 106

156 Table 5.1. Experiment 3: Local maxima of activation for group Region Side Coordinates (mm) Z score x y z chroma only Planum polare L > 8 R > 8 Heschl s gyrus L height only Planum temporale L R Data are derived from a fixed-effects analysis of the group (10 subjects). All local maxima in the superior temporal plane are shown for voxels activated by pitch chroma change but not by pitch height change ( chroma only), and by pitch height change but not by pitch chroma change ( height only). A Z score > 4.66 corresponds to p < 0.05 after correction for multiple comparisons across the entire brain volume.

157

158 Figure 5.3. Experiment 3: Statistical parametric maps for group For each contrast (indicated below the panel), statistical parametric maps of group data are rendered on the normalised group mean structural brain MRI in an axial section tilted 0.5 radians to include much of the surface of the superior temporal plane. The statistical threshold for each voxel is p < 0.05 corrected for multiple comparisons across the whole brain volume. The 90% probability boundaries for primary auditory cortex (Rademacher et al., 2001) are outlined (black). A, Broadband noise contrasted with silence (noise silence, green) activates extensive bilateral superior temporal areas including both medial and lateral Heschl s gyrus (HG). The pitch-producing stimuli contrasted with noise (pitch noise, lilac) produce more restricted bilateral activation in lateral HG, planum polare (PP) and planum temporale (PT). B, Pitch chroma change contrasted with fixed chroma (all chroma, red) activates bilateral areas in lateral HG, PP and antero-lateral PT. C, Pitch height change contrasted with fixed height (all height, blue) activates bilateral areas in lateral HG and antero-lateral PT. D, Voxels in B and C activated both by pitch chroma change and by pitch height change have been exclusively masked. Pitch chroma change but not height change ( chroma only, red) activates bilateral areas anterior to HG in PP; pitch height change but not chroma change ( height only, blue) activates bilateral areas in posterior PT. These areas represent distinct brain substrates for processing the two musical dimensions of pitch. The relative magnitude of the BOLD signal change in anterior and posterior areas is shown for each of the contrasts of interest (right). The height of the histogram columns represents the mean size of effect (signal change) relative to global mean signal for the contrasts chroma-only (red) and height-only (blue) at the peak voxels for each contrast in the right hemisphere; vertical bars represent the standard error of the mean size of effect. The histograms demonstrate opposite patterns of pitch chroma and pitch height processing in anterior and posterior auditory areas.

159 individual results for the same axial slice as in Figure 5.3. The pattern of activation in individuals was very similar to that of the group. Small volume correction was performed using anatomical volumes of interest for lateral HG, PP, and PT specified a priori (see Methods, Section 5.3.8, p. 105). As in the group analysis, voxels activated both by pitch height change and pitch chroma change were exclusively masked to identify voxels specifically activated by pitch height change or pitch chroma change. There was significant activation in each contrast in all subjects at the p < 0.05 voxel level of significance after correction for the specified volume. For pitch height change, local maxima occurred in the pre-specified volume involving PT in every subject. For pitch chroma change, significant local maxima occurred in the pre-specified volume involving lateral HG and PP in every subject. Coordinates of local activation maxima for all individuals are presented in Table 5.2 (overleaf). Although the same STP regions were consistently activated by pitch chroma change and pitch height change in different individuals, inspection of Figure 5.4 indicates substantial individual variation in the extent and distribution of activation within those functional regions. 5.5 Discussion Human brain representations of pitch dimensions In this chapter, new psychophysical evidence has been presented to support the view that pitch has distinct perceptual dimensions of height and chroma. When pitch chroma is kept constant, pitch height can be manipulated and ordered from low to high along a continuous dimension. The fmri evidence presented here suggests that these perceptual pitch dimensions have distinct representations in human auditory cortex. These cortical representations occur at a logical point in the recently proposed cortical pitch-processing hierarchy (Griffiths et al., 1998a; Patterson et al., 2002). Medial HG (PAC) is activated similarly whether processing noise or pitch. Lateral HG (non-primary auditory cortex) shows an increase in activity when processing pitch (Figures 5.3 and 5.4) and it is activated both by changing pitch height and by changing pitch chroma. Areas specifically activated by pitch height change exist posterior to HG within PT, while areas specifically activated by pitch chroma change exist anterior to HG within PP. Pitch height and pitch chroma form the basis for extended patterns describing the relations between pitches, rather than the presence or absence or perceptual salience of pitch per se. It is therefore 107

160 Table 5.2. Experiment 3: Local maxima of activation for individuals Region Peak (mm) Z score Region Peak (mm) Z score Region Peak (mm) Z score Region Peak (mm) Z score Subject 1 Subject 6 chroma only chroma only Left PP Right PP Left PP Right PP Left HG Right HG Left HG Right HG height only height only Left PT Right PT Left PT Right PT Subject 2 Subject 7 chroma only chroma only Left PP Right PP Left PP Right PP Left HG Right HG Left HG Right HG height only height only Left PT Right PT Left PT Right PT Subject 3 Subject 8 chroma only chroma only Left PP Right PP Left PP Right PP Left HG Right HG Left HG Right HG height only height only Left PT Right PT Left PT Right PT

161 Subject 4 Subject 9 chroma only chroma only Left PP Right PP Left PP Right PP Left HG Right HG Left HG Right HG height only height only Left PT Right PT Left PT Right PT Subject 5 Subject 10 chroma only chroma only Left PP Right PP Left PP Right PP Left HG Right HG Left HG Right HG height only height only Left PT Right PT Left PT Right PT Voxels representing local peaks of activation in each of the individual subjects are shown. The format of the Table corresponds to that of the images in Figure 5.4. Coordinates correspond to voxels activated by pitch chroma change but not by pitch height change ( chroma only), or by pitch height change but not by pitch chroma change ( height only). Individual subject data are shown for a significance threshold of p < 0.05, corrected for specified local volumes according to the prior anatomical hypotheses. Empty cells indicate that there was no local maximum of activation in the region specified. Key: HG, lateral Heschl s gyrus; PP, planum polare; PT, planum temporale.

162

163 Figure 5.4. Experiment 3: Individual statistical parametric maps Individual statistical parametric maps (voxel statistical threshold p < uncorrected) are rendered on each subject s normalised structural brain MRI. The axial section is tilted to run along the superior temporal plane as in Figure 5.3 and the contrasts and colour key are the same as in Figures 5.3A and D. Panels show inset views of the left and right superior temporal lobes for each subject. Bilateral areas including medial Heschl s gyrus (HG) are activated in the contrast between broadband noise and silence (green). After exclusive masking of voxels activated both by pitch chroma change and by pitch height change, the two pitch dimensions show distinct activation patterns in most individuals: pitch chroma change (but not pitch height change) activates mainly areas anterior to HG on the planum polare (red); pitch height change (but not pitch chroma change) activates mainly planum temporale (blue).

164 plausible that the specific substrates for the two pitch dimensions should involve higherorder auditory areas beyond the lateral HG pitch centre (Patterson et al., 2002). In the stimuli used in this experiment, changes in pitch chroma were not fully independent of pitch height: whereas pitch height could be varied while chroma remained fixed, changing chroma always entailed a change in pitch height. In this sense therefore the two pitch manipulations were not equivalent. This is also the case for pitch variation in many natural sounds. Pitch height may change independently of chroma when the sound source changes or several sound sources are present, whereas chroma changes independently of pitch height only under certain circumstances (for example, in the tritone illusion: Shepard, 1982). However, the anterior cortical areas activated here by changing chroma but not pitch height represent (by definition) a specific substrate for chroma analysis. These areas can be plausibly proposed as a neuroanatomical correlate of pitch chroma perception that does not depend on pitch height. The existence of such a specific mechanism is supported by the psychophysical principle of octave equivalence (Krumhansl, 1990; Deutsch, 1999). An interaction between pitch height and chroma processing was detected in the present experiment. This interaction is consistent with human psychophysical observations such as the perceptual disruption of melodies by octave scrambling (Deutsch, 1999). However, the direction of the interaction here appears rather counter-intuitive, in that the effect of changing either pitch dimension was greater when the other dimension was fixed rather than changing. This interaction suggests that the detection of change in a single pitch dimension is more computationally demanding than the detection of simultaneous change in both dimensions. Change in a single parameter of a harmonic series (such as f0, harmonic spacing or intensity) may require a more complete analysis of the auditory spectrum than simultaneous change in different parameters. Such a neural mechanism might be engaged by certain techniques in Western music, such as polyphony and Klangfarbenmelodie, which depend on the perception of a coherent melodic line based predominantly on pitch chroma or pitch height cues. In the absence of direct evidence, however, this interpretation remains speculative. 108

165 5.5.2 Pitch height and pitch chroma in auditory scene analysis The analysis of pitch variation in sound sequences extending over seconds is required to process melodies in music and prosody in speech, and such sequences have previously been shown to activate bilateral auditory areas extending from antero-lateral PT into STG and PP (Griffiths et al., 1998a; Patterson et al., 2002). However, previous human functional imaging studies did not manipulate the two dimensions of pitch separately, and did not address the possibility that distinct brain mechanisms exist for the processing of pitch chroma and height. The pitch height changes in the present experiment were perceived by subjects as a disrupted auditory stream. Here, pitch height provided a nonspatial cue for the segregation of sound sources at an early stage in auditory source analysis (Bregman, 1990). Previous work showing activation in PT where spatial cues were the basis for segregation can be interpreted in a similar way (Zatorre et al., 2002b). A specific mechanism for pitch height processing in PT is therefore in accord with the model proposed in Chapter 4 (Section 4.5.3, p. 92), according to which PT plays a critical early role in the segregation of sound sources in the acoustic environment. In contrast, specific activation of PP anterior to HG by stimuli with changing chroma supports previous work on melody (Zatorre et al., 1994; Griffiths et al., 1998a; Patterson et al., 2002) and speech (Scott et al., 2000; Scott & Johnsrude, 2003) processing. Activation of anterior auditory areas would afford a mechanism for tracking pitch chroma patterns that form coherent information streams which can be analysed independently of the specific sound source. The detection of a target tone sequence (sequential grouping) in the presence of competing acoustic features could be regarded as a form of auditory foreground-background decomposition. This task has been specifically associated with activation in a non-primary cortical area anterior to HG (Scheich et al., 1998). Such a task would be predicted to engage the chroma analysis mechanism demonstrated here, since it requires the segregation of distinct chroma patterns that might be produced by a single sound source, rather than a segregation of distinct sound sources based on pitch height cues. 109

166 5.5.3 Pitch height and pitch chroma as source-dependent and sourceindependent auditory properties The findings of the present experiment have implications that extend beyond the cortical architecture of pitch perception. An important motivation of this experiment was to use the perceptual pitch dimensions to probe auditory cortical mechanisms for the analysis of source-dependent and source-independent information. Previous chapters of this thesis have outlined the controversies surrounding the anatomical and functional organisation of the cortical auditory system in both humans and nonhuman primates (for example, Belin & Zatorre, 2000; Romanski et al., 2000; Zatorre et al., 2002b; see Chapter 1, Section , p. 14; Chapter 3, Section 3.5.3, p. 80; Chapter 4, Section 4.5.3, p. 92). The present findings speak in particular to the auditory What Where debate, and the specific question whether distinct cortical mechanisms analyse different forms of auditory What information. In humans, Experiments 1 and 2 here and previous functional imaging (Alain et al., 2001; Maeder et al., 2001) and lesion (Clarke et al., 2000) data support a dual organisation of auditory What and Where processing networks involving cortical areas beyond PAC. Further, the cortical network that supports What processing has been shown to be distributed along the antero-posterior axis of the temporal lobe in both humans and non-human primates (see Figure 1.2). In humans, the analysis of melodies and speech information streams engages temporal lobe areas both anterior and posterior to HG (Griffiths et al., 1998a; Scott et al., 2000; Patterson et al., 2002; Mitchell et al., 2003; Scott & Johnsrude, 2003), while in the macaque, neurons in both the lateral (Rauschecker et al., 1995) and posterior (Tian et al., 2001) temporal lobe demonstrate electrophysiological responses to particular call sounds. The present experiment allows more specific conclusions regarding the organisation of the distributed What network in the human auditory brain. A mechanism for analysing pitch height exists in the human posterior temporal lobe, while a mechanism for analysing pitch chroma patterns exists in the anterior temporal lobe. If pitch height differences are analysed in the early stages of auditory scene analysis, this implies that the posterior cortical mechanism is engaged in the segregation of sound sources in the environment. This evidence that a source-dependent property (pitch height) is processed selectively by posterior cortical areas in the human brain might appear somewhat at variance with macaque data indicating relative selectivity of rostral lateral belt areas for 110

167 con-specific call sounds. However, the demonstrated sensitivity of macaque caudal belt fields to both spatial and object properties (Tian et al., 2001) suggests that posterior auditory areas may be engaged in the segregation of sound sources based on both spatial and non-spatial information in different primate species. The existence of a distinct cortical substrate for pitch chroma processing anterior to HG in the human brain suggests that these anterior cortical areas are engaged in tracking information streams associated with single sound sources that do not depend on identity of the source. It has been shown that macaques can make acoustic discriminations based on pitch direction independently of source-dependent cues such as absolute frequency (Brosch et al., 2004) and perceive octave equivalence between tonal melodies (Wright et al., 2000). Such evidence suggests that generic mechanisms for the extraction of source-independent information in sound sequences may exist in primate species other than humans. The present data provide further evidence for functional differentiation within human PT. The group data indicate that the analysis of pitch height specifically activates posterior PT (Figure 5.3 and Table 5.1), whereas both pitch chroma and pitch height information is analysed in antero-lateral PT. Pitch height serves as a cue that might be used in early auditory scene analysis to segregate sound sources in the environment. Accordingly, the present findings complement those of Experiments 1 (Chapter 3) and 2 (Chapter 4), where spatial spectrotemporal analysis produced activations centred on postero-medial PT. Together, the three experiments suggest an organisational scheme in which segregation of sound sources (whether based on spatial information as in Experiments 1 and 2, or pitch height information, as in the present experiment) is based on computation in functional subregions of PT that lie posterior to those supporting the computation of source-independent spectrotemporal properties. While electrophysiological data support the functional differentiation of spatial and non-spatial processing mechanisms in the primate posterior temporal plane (see Chapter 1, Sections and ), data to support a homologous differentiation for non-spatial source-dependent and sourceindependent properties have yet to be obtained in non-human primates. However, the computational model of human PT function proposed in Chapter 4 (Section 4.5.3, p. 92) would predict that the spatial and non-spatial (pitch height) information about sound sources extracted in posterior PT should be gated to distinct higher cortical areas that support the perception of auditory spatial location and object identity, respectively. This prediction is examined further in the next chapter (Chapter 6). Chapter 6 describes an 111

168 experiment to establish a human brain mechanism for a stage of auditory object analysis beyond the segregation of sound sources. This processing stage involves the extraction of sound object features that might support sound identification. 112

169 Chapter 6. ANALYSIS OF SPECTRAL SHAPE Summary Spectral envelope is the shape of the power spectrum of a sound. It is an important cue for the identification of sound sources such as voices or instruments, and particular classes of sounds such as vowels. The spectral envelope establishes a perceptual boundary that the brain might use to categorise a sound as an auditory object distinct from the acoustic background. In everyday life, sounds with similar spectral envelopes are perceived as similar: we recognise a voice or a vowel regardless of pitch and intensity variations, and we recognise the same vowel regardless of whether it is voiced (a spectral envelope applied to a harmonic series) or whispered (a spectral envelope applied to noise). This fmri experiment (Experiment 4) describes human brain mechanisms for analysis of the spectral envelope of sounds. Changing either the pitch or the spectral envelope of harmonic sounds produced similar activation within a bilateral network including Heschl s gyrus and adjacent cortical areas in the superior temporal lobe. Changing the spectral envelope of continuously alternating noise and harmonic sounds produced additional rightwardasymmetric activation in superior temporal sulcus. The findings show that spectral shape is abstracted in superior temporal sulcus, suggesting that this region may have a generic role in the spectral analysis of sounds. These distinct levels of spectral analysis may represent early computational stages in a putative anteriorly directed pathway for the extraction of sound object features. 113

170 6.1 Background The identification of sounds is a fundamental task of auditory perception. Sound identification is an active process that occurs automatically and efficiently despite marked changes in acoustic details. For example, humans identify particular human voices or animal calls despite large changes in pitch and loudness, and listeners can recognise particular musical instruments even where transmission is distorted (Samson & Zatorre, 1994). This chapter addresses brain mechanisms that might support sound categorisation prior to the attribution of meaning. These brain mechanisms are likely to involve the extraction of spectrotemporal features that establish the identity of a sound as an object that is distinct from other sounds in the acoustic environment. The theoretical framework for the study of object processing is drawn largely from visual neuroscience, and the definition of an auditory object presents conceptual difficulties. Here, sound object is defined very generally as a combination of spectrotemporal features that characterises a sound as an auditory entity distinct from the acoustic background. This is one of several plausible alternative definitions (the term object might, for example, be taken to refer to the sound source itself, rather than the acoustic information available about that source). The more general definition of an auditory object is preferred here, because it can be applied equally to a sound source (such as an individual speaker) or an auditory event (such as a vowel). This usage implies the existence of a perceptual boundary that allows a particular sound object to be separated from the acoustic background, analogous to the separation of figure from ground in vision. Perceptual boundaries are likely to be a fundamental attribute of objects in all sensory domains (Kubovy & Van Valkenburg, 2000). The perception of object features that act as boundaries would minimise the computational demands of auditory scene analysis by providing a mechanism for the grouping of elementary sound properties into larger units that correspond to particular sound sources (Bregman, 1990). A sound object feature in this scheme might be analogous to a property such as the shape of a visual object. Here, an object feature corresponds to a level of analysis in which more elementary features are combined to form an object representation that can be used to identify that object despite changes in other elementary attributes. An operationally equivalent process in vision might be the combination of elementary visual attributes such as edges to define an object shape that can be identified despite variation in colour, size or texture. Such object 114

171 representations could be considered as templates that can be used to define objects for subsequent semantic and cross-modal processing (Mesulam, 1998). In the auditory domain, one important object feature that may establish a perceptual boundary is the spectral envelope or shape of a sound. The spectral envelope defines the shape of the power spectrum of a complex sound, and is one means by which sound sources such as voices or sound classes such as vowels can be characterised. The spectral envelope of a sound is related to a dimension of timbre, the spectral centroid (Grey, 1977; Krumhansl & Iverson, 1992; McAdams & Cunible, 1992). Timbre is defined as the property that distinguishes two sounds of identical pitch, duration and intensity (American Standards Association, 1960). In addition to spectral envelope, timbre has a dimension determined by temporal envelope (the attack and decay of a sound) and a third dimension that is not consistent between studies (Grey, 1977; Krumhansl & Iverson, 1992; McAdams & Cunible, 1992). Everyday experience suggests that we possess mechanisms for the abstraction of spectral envelope from the detailed spectrotemporal structure of the sound with which it is associated. For example, we perceive the same vowel sound whether the vowel is voiced or whispered. Here, a derived sound feature (the spectral shape) is used to establish a perceptual similarity between two sounds with very different acoustic fine structure, namely a harmonic series or noise. Figure 6.1 (overleaf) illustrates the common spectral envelope of voiced and whispered vowels. The findings of Experiments 1 (Chapter 3), 2 (Chapter 4) and 3 (Chapter 5) show that areas in the human posterior STP including lateral HG and PT are involved in the segregation of sound sources. Previous human functional imaging studies have shown that temporal lobe areas located anteriorly and inferiorly in STG and STS participate in the identification of natural sounds such as voices, phonemes, musical instruments and animal calls (Engelien et al., 1995; Belin et al., 2000; Humphries et al., 2001; Maeder et al., 2001; Adams & Janata, 2002; Menon et al., 2002; Thierry et al., 2003; Beauchamp et al., 2004; Binder et al., 2004; Zatorre et al., 2004; Lewis et al., 2004). The mechanisms of early auditory scene analysis that support sound source segregation are likely to involve the representation of spectrotemporal fine structure. This fine structure is embodied in cues to spatial location and motion provided by the HRTF (Experiments 1 and 2) and in pitch height cues provided by the energy of individual harmonics in a 115

172

173 Figure 6.1. Experiment 4: Spectral shape of sounds Schematic frequency-domain representations of a vowel (/a/) and a generic sound used in the fmri experiment. For both the vowel and the generic sound, the same spectral envelope can be applied to two different types of spectral fine structure: a harmonic series (here f0 = 100 Hz) and noise. In the case of the vowel, this manipulation is perceived as the same phoneme voiced (harmonic) or whispered (noise). In the case of the generic sound, the harmonic and noise versions of the sound are perceived as similar despite the change in spectrotemporal fine structure; this perceptual similarity is based on a derived feature (spectral shape), since the stimulus does not closely resemble any familiar natural sound.

174 harmonic complex (Experiment 3). In contrast, the semantic processing of sounds is likely to involve the formation of symbolic object tokens that can be combined with information obtained from other sensory modalities and retrieved from memory (Mesulam, 1998; Griffiths et al., 1999b). Hierarchical models of sensory object processing therefore predict the existence of a critical uncharacterised intermediate processing step in which object features are analysed to abstract object representations that are manipulated by cross-modal and amodal semantic networks (Mesulam, 1998; Husain et al., 2004). These models are at present based largely on evidence obtained for object analysis and feature extraction in the visual domain (Riesenhuber & Poggio, 2002; Palmeri & Gauthier, 2004). In the case of the auditory system, this intermediate processing stage in which sound object features such as the spectral envelope are abstracted from spectrotemporal fine structure has not been established, however sensitivity to object features before recognition has been demonstrated in right anterior STS using PET in humans (Zatorre et al., 2004). Elucidation of the mechanism for such an intermediate processing stage in auditory object analysis is an issue of considerable importance. If brain mechanisms that abstract sound object features can be identified, this would provide an important link in the processing hierarchy that leads to sound identification, and support the existence of common organisational principles that govern object processing in different sensory systems. In the present fmri experiment, brain mechanisms that represent acoustic fine structure were distinguished from mechanisms that specifically extract auditory object features using stimuli in which spectral composition (spectrotemporal fine structure) and spectral shape (an auditory object feature) could be manipulated independently. The stimuli had spectral features in common with natural sounds such as spoken vowels and animal calls (Figure 6.1), however they were not perceived as corresponding to any familiar sound. The stimuli could therefore be used to probe generic brain mechanisms that analyse both the spectrotemporal fine structure and the spectral envelopes of natural sounds such as vowels (Figure 6.1), while avoiding the semantic associations of real vowels or musical instrumental timbres (Mesulam, 1998; Griffiths et al., 1999b). Previous studies of pitch pattern processing (Patterson et al., 2002) suggest that analysis of spectrotemporal fine structure (which might be embodied in generic stimuli or complex natural sounds) is likely to involve a cortical network including non-primary auditory cortex in HG and PT. In contrast, the identification of many natural sounds has been shown to engage STS 116

175 (Engelien et al., 1995; Belin et al., 2000; Maeder et al., 2001; Adams & Janata, 2002; Menon et al., 2002; Beauchamp et al., 2004; Binder et al., 2004; Zatorre et al., 2004; Lewis et al., 2004). STS is therefore likely to be involved in a more abstract level of spectrotemporal analysis that is relevant to the processing of sound identity. The common involvement of STS in the analysis of diverse sound categories is consistent with a generic mechanism of sound object analysis in STS, which might underpin any more specific role of this region in the processing of particular sound categories such as human voices. 6.2 Experimental hypotheses 1. Spectral envelope is an auditory object feature and is computed by a brain mechanism distinct from those engaged in the analysis of the spectrotemporal fine structure of complex sounds. 2. This generic pre-semantic mechanism of sound object analysis is instantiated in a cortical network including PT, posterior STG and STS: this network links posterior temporal plane areas involved in spectral representation with inferior and anterior temporal lobe areas that attribute meaning to sounds. The design of this fmri experiment (Experiment 4) was predicated on a model of early sound object analysis in which distinct brain mechanisms, instantiated in distinct cortical networks, process spectrotemporal fine structure and auditory object features. Specifically, it was predicted that spectrotemporal fine structure engages the posterior network previously demonstrated in sound source segregation, while a subsequent processing stage, involving the abstraction of sound object features independently of spectral fine structure, engages surrounding areas extending from the posterior temporal plane onto the lateral temporal convexity and along STS. Such an anatomical scheme would link areas involved in early auditory scene analysis with inferior and anterior temporal lobe areas previously shown to be involved in the identification of natural sounds such as voices. 117

176 6.3 Methods Rationale for the experimental design The experimental design (Figure 6.2, overleaf) employed sequences of sounds composed either entirely of harmonic sounds ( all-harmonic conditions), or from alternating harmonic sounds and noise ( alternating conditions). These two types of conditions were designed to probe two levels of spectral analysis. At the first level of analysis, corresponding to the stimulus sequences consisting entirely of harmonic sounds, spectral envelope or pitch was manipulated (Figure 6.2: all-harmonic conditions). The encoding of these stimulus properties could be based on an analysis of the detailed spectrotemporal structure of the sound. Changing the spectral envelope alters the detailed spectral structure of the sound, while changing pitch (in stimuli such as these, with unresolved harmonics) is associated with changes in the repetition rate and temporal structure of the sound. At the second level of spectral analysis, corresponding to the stimulus sequences consisting of alternating noise and harmonic sounds, spectral envelope was manipulated while the detailed spectrotemporal structure of the sounds was constantly varied (Figure 6.2: alternating conditions). In the alternating conditions, changes in spectral envelope must be analysed independently of changes in spectrotemporal details. This level of spectral analysis should engage additional brain mechanisms to those involved in the analysis of spectrotemporal fine structure: these additional mechanisms are required for the extraction of spectral shape independently of changes in fine structure, and allow the identity of a sound object (for example, a vowel voiced or whispered) to be perceived as changing or constant despite changes in spectrotemporal details. The second level of spectral analysis allows an auditory object feature (a given spectral shape) to be abstracted from different spectrotemporal fine structures encoded at the first level Stimulus details Stimuli (see Appendix I) were synthesised digitally in the frequency domain from harmonic series or fixed-amplitude, random-phase noise with equivalent passband and intensity (sampling rate 44.1 khz and 16 bit resolution). Harmonic sounds were in positive Schroeder phase (Schroeder & Strube, 1986) to reduce peak factor. Stimulus parameters were selected to approximate those that might occur in natural sounds (for example spoken vowels), however none of the stimuli closely resembled any class of 118

177

178 Figure 6.2. Experiment 4: Schematic representations of stimuli Examples of sound sequences derived from the stimuli used in the fmri experiment. Individual elements of each sequence are represented in the frequency domain (see Figure 6.1). Sequences are composed entirely of harmonic sounds (all-harmonic sequences) or harmonic sounds alternating with noise (alternating sequences). Key: fcsc, f0 constant, spectral shape constant; fvsc, f0 varying, spectral shape constant; fcsv, f0 constant, spectral shape varying.

179 natural sounds (Figures 6.1 and 6.2). Spectral envelope was specified in the frequency domain for both noise and harmonic stimuli (Figure 6.2), and the duration of each sound was 500 ms (with 20ms gating windows). Sounds were combined into sequences comprising either harmonic sounds only (all-harmonic sequences), or alternating harmonic sounds and noise (alternating sequences; see Figure 6.2). In the all-harmonic sequences, fundamental frequency (f0) either remained fixed or was varied (120, 144, 168 or 192 Hz) between successive elements of the sequence. In both the all-harmonic and alternating sequences, the spectral envelope of the individual sounds either remained fixed or was varied (one of four spectral envelope shapes; see Figure 2) between successive elements in the sequence. Changes in f0 are perceived as changes in pitch, while changes in spectral envelope (spectral shape) are perceived as changes in the identity of the sound. These manipulations were used to create three types of allharmonic sequence (both f0 and spectral shape constant; f0 varying and spectral shape constant; f0 constant and spectral shape varying) and two types of alternating sequence (spectral shape constant and spectral shape varying). Each sequence contained either 15 or 16 elements (total sequence duration 7.5 or 8 seconds) Subjects Fourteen subjects (six males, eight females; 13 right-handed, one left-handed) aged 24 to 39 participated. None had any history of a hearing disorder or neurological illness and all had normal structural MRI scans fmri protocol Six stimulus conditions, each corresponding to a different type of sound sequence and a distinct percept, were presented during scanning (Figure 6.2): 1) all-harmonic sequences with fixed f0 and fixed spectral shape (same sound object with constant pitch); 2) allharmonic sequences with changing f0 and fixed spectral shape (same sound object with varying pitch); 3) all-harmonic sequences with fixed f0 and changing spectral shape (changing sound object with same pitch); 4) alternating harmonic-noise sequences with fixed f0 and fixed spectral shape (same sound object composed alternately of noise and harmonics with fixed pitch); 5) alternating harmonic-noise sequences with fixed f0 and 119

180 changing spectral shape (changing sound object composed alternately of noise and harmonics with fixed pitch); 6) silence. During scanning, subjects were instructed to attend to the sound sequences with their eyes closed. To help maintain alertness, they were required to make a single button press at the end of each sequence using a button box positioned beneath the right hand. There was no active auditory discrimination task. Stimuli were delivered in randomised order using a custom delivery system based on Koss electrostatic headphones at a fixed sound pressure level of 80 db (see Table 2.1 and Figure 2.2). Following each sound sequence, brain activity was estimated by the BOLD response at 1.5 T (Siemens Sonata, Erlangen) using gradient EPI in a sparse acquisition protocol (inter-scan interval 12.5 s; TE 50 ms) (Hall et al., 1999; see Chapter 2, Section , p. 54). Each brain volume comprised 48 transverse 2mm slices covering the whole brain with an inter-slice gap of 1 mm and inplane resolution of 3 x 3 mm. 192 brain volumes were acquired for each subject (32 volumes for each condition, in two sessions) Behavioural data The ability of listeners to perceive the spectral shape changes used in the scanning sequences was assessed in a separate psychophysical experiment. The same 15- or 16- element sequences presented during scanning were presented in a two-interval-twoalternative forced choice procedure in which three subjects were required to detect the sequences containing changes. Subjects were able to detect harmonic sequences containing pitch change or spectral envelope change with 100% accuracy. Subjects were also able to detect alternating noise-harmonic sequences containing spectral envelope change with 100% accuracy. These psychophysical findings established that the changes in pitch and spectral shape used during scanning were highly salient, and spectral shape changes could be perceived independently of constantly changing fine spectrotemporal structure Image analysis Image data were analysed for the entire group and for each individual subject using statistical parametric mapping implemented in SPM99 software 120

181 ( see Chapter 2, Sections 2.3 and 2.4 and Figure 2.3). Scans were first realigned and spatially normalised (Friston et al., 1995a) to the MNI standard stereotactic space (Evans et al., 1993). Data were spatially smoothed with an isotropic Gaussian kernel of 8 mm FWHM. SPMs were generated by modelling the evoked hemodynamic response for the different stimuli as boxcars convolved with a synthetic hemodynamic response function in the context of the general linear model (Friston et al., 1995b). Population-level inferences concerning BOLD signal changes in the contrasts of interest (in the all-harmonic conditions, changing pitch minus fixed pitch and changing spectral shape minis fixed shape; in the alternating conditions, changing spectral shape minus fixed shape) for the group were based on a random-effects model that estimated the second level t statistic at each voxel (see Chapter 2, Section 2.5.2, p. 66). Hemispheric laterality effects for the contrasts of interest were assessed using a second level paired t test comparing each contrast image with its counterpart flipped about the antero-posterior axis. An exclusive masking procedure was used to identify brain areas activated specifically by the f0 contrast and by the spectral shape contrast between the all-harmonic conditions. Exclusive masking was also used to identify brain areas activated specifically by the spectral shape contrast between the alternating harmonic-noise conditions but not by the all-harmonic f0 contrast. This step was performed in order to identify brain areas engaged specifically in the analysis of spectral shape but not spectral fine structure. These masking procedures applied logical operators across all voxels in the SPMs corresponding to each contrast in order to identify voxels activated by one contrast but not the other contrast. Local maxima in each of the contrasts of interest were assessed using a voxel significance threshold of p < 0.05, after small volume correction taking the prior anatomical hypotheses into account. Anatomical small volumes comprised left and right HG, STG and STS (based on the group mean normalised structural MRI brain volume) and 95% probability maps for left and right human PT (Westbury et al., 1999). Individual subject data were analysed in order to assess the anatomical variability of spectral envelope processing within the group. Individual subject data were analysed using the same statistical model and assessed for each voxel at a significance threshold of p < 0.05 after small volume correction taking the a priori anatomical hypotheses into 121

182 account. The small volumes used in the individual subject analyses were identical to those used in the group analysis. 6.4 Results Group data This experiment was designed to examine two levels of auditory spectral analysis using contrasts based on two types of stimuli. In the all-harmonic contrasts, the baseline condition has constant spectrotemporal fine structure, whereas in the contrast between the alternating conditions, the baseline condition has constantly changing spectrotemporal fine structure (harmonic series or noise). Spectral envelope or pitch changes in the allharmonic sequences can be assessed from changes in spectrotemporal fine structure, whereas spectral envelope changes in the alternating sequences must be analysed independently of the changing spectrotemporal fine structure and demand additional computational resources. Statistical parametric maps of brain activation for the contrasts of interest for the group are shown in Figure 6.3 (overleaf). Local maxima for the group in each of the contrasts of interest at a statistical threshold of p < 0.05 (after small volume correction based on the prior anatomical hypotheses) are presented in Table 6.1 (overleaf). The contrast between all-harmonic conditions with changing and fixed f0 (Figure 6.3A), and the contrast between all-harmonic conditions with changing and fixed spectral envelope (Figure 6.3B) both produced bilateral activation in PAC in medial HG (Rademacher et al., 2001), and in non-primary auditory cortex in lateral HG and anterolateral PT (Westbury et al., 1999). No temporal lobe regions were activated only in the pitch change contrast or only in the spectral envelope contrast between the all-harmonic conditions. The superior temporal lobe regions engaged in both pitch analysis and spectral envelope analysis in the all-harmonic conditions together constitute a substrate for the analysis of spectrotemporal fine structure. The contrast between alternating conditions with changing and fixed spectral envelope produced bilateral activation including medial and lateral HG, antero-lateral PT and posterior STG, and extending anteriorly along STS (Figure 6.3C). This activation 122

183 Table 6.1. Experiment 4: Local maxima of activation for group Interpretation Region Side Coordinates (mm) x y z Z score Change in fine spectrum associated with f0 change Changing minus constant f0 (all-harmonic) Lateral L Heschl s gyrus R Planum temporale L R Changing minus constant spectral envelope (all-harmonic) Change in fine spectrum associated with spectral shape change Planum temporale L R Changing minus constant spectral envelope (harmonic-noise) excluding changing f0 minus constant f0 (all-harmonic) Change in spectral shape only: abstraction of spectral shape Planum temporale Superior temporal gyrus Superior temporal sulcus R R R Data are derived from a random-effects analysis of the group (14 subjects). All local maxima are shown for each of the contrasts of interest for the group: changing f0 minus constant f0 (Figure 6.3A); changing spectral shape minus constant spectral shape in the all-harmonic conditions (Figure 6.3B); and changing spectral shape minus constant spectral shape in the alternating harmonic-noise conditions (Figure 6.3C) after exclusion of all voxels activated in the changing f0 contrast (Figure 6.3D). Statistical threshold is p<0.05 after small volume correction based on the prior anatomical hypotheses.

184

185 Figure 6.3. Experiment 4: Statistical parametric maps for group Statistical parametric maps of group data (thresholded at an uncorrected voxel significance criterion of p < for display purposes) have been rendered on axial sections of the group mean normalised structural brain MRI. Each section has been tilted -0.5 radians to lie parallel to the superior surface of the temporal lobes; in each panel, the upper section lies along the superior temporal plane (STP), while the lower section runs along the dorsal bank of the superior temporal sulcus (STS). A,B, Analysis of f0 change (changing f0 minus fixed f0, f0 all-harm) (A) and analysis of spectral envelope change (changing spectral shape minus constant spectral shape, shape all-harm) in the all-harmonic conditions (B) each activates a similar bilateral brain network including Heschl s gyrus (HG) and antero-lateral planum temporale (PT). These areas represent a brain substrate for the analysis of spectrotemporal fine structure. C, Analysis of spectral shape change (changing spectral shape minus constant spectral shape, shape alternate) in the alternating harmonicnoise conditions activates a more extensive bilateral network extending from antero-lateral PT through posterior STG and along STS. D, The SPM in C has been exclusively masked by the SPM in A to identify areas involved in the analysis of spectral envelope changes but not in the analysis of the spectrotemporal fine structure ( shape only): these areas represent a specific brain substrate for the abstraction of spectral shape, a key step in sound object perception.

186 overlapped with the network identified in both the all-harmonic contrasts (Figure 6.3A,B), but was more extensive. The contrast between the alternating conditions (Figure 6.3C) was exclusively masked by the contrast between changing and fixed f0 (Figure 6.3A) in order to identify brain areas engaged by spectral envelope changes but not by pitch changes. Areas identified after exclusive masking extended from antero-lateral PT through STG and anteriorly along STS (Figure 6.3D). These temporal lobe areas lie beyond the network engaged in pitch analysis, and constitute a substrate for the analysis of spectral shape independently of spectrotemporal fine structure. The magnitude and extent of activation was greater in the right than the left hemisphere, and maximal in the mid-portion of right STS (Table 6.1), however this asymmetry was not statistically significant (p < 0.05) Individual data Individual subject analyses were carried out to determine whether the anatomical distinction between the regions specific for the analysis of spectral envelope and spectrotemporal fine structure in the group was also evident in the individual data. Coordinates of local activation maxima for all individuals are presented in Table 6.2. (overleaf). Small volume correction was based on the a priori anatomical hypotheses (see Methods, Section 6.3.6, p. 121). For the analysis of spectrotemporal fine structure (change in f0 or change in spectral envelope) in the all-harmonic conditions, local maxima occurred in the pre-specified volume in 11 of the 14 subjects at the p < 0.05 voxel level of significance after small volume correction. Local maxima were bilateral in seven subjects, however, the anatomical subregions activated (HG or PT) varied between individuals, and no subject had local maxima in all of the subregions identified for the processing of spectrotemporal fine structure in the group analysis. As in the group analysis, exclusive masking was used to identify voxels specifically activated by spectral envelope change in the alternating conditions but not by change in spectrotemporal fine structure (change in f0). For the specific analysis of spectral shape change, local maxima occurred in the pre-specified volume in nine of the 14 subjects at the p < 0.05 voxel level of significance after small volume correction. Local maxima were bilateral in only two subjects, the anatomical subregions activated (PT, STG or STS) varied between individuals, and no subject had local maxima in all of the regions identified for the specific processing of spectral shape in the group analysis. 123

187 Table 6.2. Experiment 4: Local maxima of activation for individuals Region Peak (mm) Z score Region Peak (mm) Z score Region Peak (mm) Z score Region Peak (mm) Z score Subject 1 Subject 2 Changing minus constant f0 (all-harmonic) Changing minus constant f0 (all-harmonic) L HG R HG L HG R HG L PT R PT L PT R PT Changing minus constant spectral envelope (all-harmonic) Changing minus constant spectral envelope (all-harmonic) L PT R PT L PT R PT Changing minus constant spectral envelope (alternating) only Changing minus constant spectral envelope (alternating) only L PT R PT L PT R PT L STG R STG L STG R STG L STS R STS L STS R STS Subject 3 Subject 4 Changing minus constant f0 (all-harmonic) Changing minus constant f0 (all-harmonic) L HG R HG L lat HG R HG L PT R PT L PT R PT Changing minus constant spectral envelope (all-harmonic) Changing minus constant spectral envelope (all-harmonic) L PT R PT L PT R PT Changing minus constant spectral envelope (alternating) only Changing minus constant spectral envelope (alternating) only L PT R PT L PT R PT L STG R STG L STG R STG L STS R STS L STS R STS

188 Subject 5 Subject 6 Changing minus constant f0 (all-harmonic) Changing minus constant f0 (all-harmonic) L HG R HG L HG R HG L PT R PT L PT R PT Changing minus constant spectral envelope (all-harmonic) Changing minus constant spectral envelope (all-harmonic) L PT R PT L PT R PT Changing minus constant spectral envelope (alternating) only Changing minus constant spectral envelope (alternating) only L PT R PT L PT R PT L STG R STG L STG R STG L STS R STS L STS R STS Subject 7 Subject 8 Changing minus constant f0 (all-harmonic) Changing minus constant f0 (all-harmonic) L HG R HG L HG R HG L PT R PT L PT R PT Changing minus constant spectral envelope (all-harmonic) Changing minus constant spectral envelope (all-harmonic) L PT R PT L PT R PT Changing minus constant spectral envelope (alternating) only Changing minus constant spectral envelope (alternating) only L PT R PT L PT R PT L STG R STG L STG R STG L STS R STS L STS R STS

189 Subject 9 Subject 10 Changing minus constant f0 (all-harmonic) Changing minus constant f0 (all-harmonic) L HG R HG L HG R HG L PT R PT L PT R PT Changing minus constant spectral envelope (all-harmonic) Changing minus constant spectral envelope (all-harmonic) L PT R PT L PT R PT Changing minus constant spectral envelope (alternating) only Changing minus constant spectral envelope (alternating) only L PT R PT L PT R PT L STG R STG L STG R STG L STS R STS L STS R STS Subject 11 Subject 12 Changing minus constant f0 (all-harmonic) Changing minus constant f0 (all-harmonic) L HG R HG L HG R HG L PT R PT L PT R PT Changing minus constant spectral envelope (all-harmonic) Changing minus constant spectral envelope (all-harmonic) L PT R PT L PT R PT Changing minus constant spectral envelope (alternating) only Changing minus constant spectral envelope (alternating) only L PT R PT L PT R PT L STG R STG L STG R STG L STS R STS L STS R STS

190 Subject 13 Subject 14 Changing minus constant f0 (all-harmonic) Changing minus constant f0 (all-harmonic) L HG R HG L HG R HG L PT R PT L PT R PT Changing minus constant spectral envelope (all-harmonic) Changing minus constant spectral envelope (all-harmonic) L PT R PT L PT R PT Changing minus constant spectral envelope (alternating) only Changing minus constant spectral envelope (alternating) only L PT R PT L PT R PT L STG R STG L STG R STG L STS R STS L STS R STS Voxels representing local peaks of activation in each of the individual subjects are shown. Coordinates correspond to voxels activated in each of the contrasts of interest shown for the group data in Figure 6.3 and Table 6.1: changing pitch minus constant pitch in the all-harmonic conditions (changing minus constant f0 (all-harmonic)); changing spectral shape minus constant spectral shape in the allharmonic conditions (changing minus constant spectral envelope (all-harmonic)); and changing spectral shape minus constant spectral shape in the alternating harmonic-noise conditions after exclusion of all voxels activated in the changing f0 contrast (changing minus constant spectral envelope (alternating) only). Individual subject data are shown for a significance threshold of p < 0.05, corrected for specified local volumes according to the prior anatomical hypotheses. Empty cells indicate that there was no local maximum of activation in the region specified. Key: HG, lateral Heschl s gyrus; L, left; PT, planum temporale; R, right; STG, superior temporal gyrus; STS, superior temporal sulcus.

191 6.5 Discussion Human brain mechanisms for computing spectral envelope This experiment has demonstrated distinct levels of analysis of spectral envelope that map onto distinct cortical regions in the human auditory brain. The group data show that changing the spectral envelope or the pitch of a harmonic sound engages a brain network that includes non-primary auditory cortex in lateral HG and antero-lateral PT. In these harmonic stimuli, both pitch and spectral envelope changes might be analysed on the basis of the detailed spectrotemporal structure of the stimulus, and it is likely that this network in the superior temporal lobe contains mechanisms for the analysis of spectrotemporal fine structure. While these mechanisms have a similar neuroanatomical substrate, they are not necessarily identical. Changes in pitch can, in general, be represented as changes in harmonic spacing in the frequency domain or repetition rate in the time domain. Where harmonics are all unresolved (as in this experiment), the repetition rate (fine temporal structure) is most relevant to the analysis of pitch change. Changes in spectral envelope produce changes in the detailed auditory spectrum. This experiment provides evidence for a further level of spectral analysis, when analysis of the spectral envelope cannot be achieved by analysis of the detailed spectrotemporal structure of the stimuli. In the alternating conditions, the detailed spectrotemporal structure of the stimuli is constantly changing, and the computation of spectral shape requires the abstraction of spectral envelope independently of acoustic details. The group data here show that this more abstract level of spectral analysis occurs in temporal lobe areas beyond those engaged in the analysis of detailed spectrotemporal structure. The additional temporal lobe areas comprise a rightward-asymmetric network extending from the STP lateral to PAC, inferiorly and anteriorly along STS. These data suggest that the mid-portion of right STS contains a specific mechanism for the extraction of spectral shape. The individual subject data further suggest that the neuronal substrate for this mechanism is distributed over an anatomical region that varies in location and extent between individuals. Taken together with the results of Experiment 3 (Chapter 5), these findings are consistent with a hierarchy of human brain areas engaged in the spectral analysis of complex sounds. This hierarchy comprises at least two distinct levels: spectrotemporal fine 124

192 structure is analysed in HG, PT and posterior STG, while spectral envelope is specifically extracted in a distinct, rightward-asymmetric network extending antero-laterally from PT through posterior STG and along STS. The substantial individual variation observed with respect to the organisation of higher-order functional subregions within this hierarchy is consistent with previous observations in other sensory systems. In the human visual system, for example, the location of area V5 and functional activation of this area in response to visual motion show substantially greater individual variation than does primary visual cortex in area V1 (Tootell et al., 1995) Hierarchical analysis of natural sound objects Previous functional imaging studies in humans have demonstrated responses to a variety of natural sounds in the STP, STG and STS (Engelien et al., 1995; Belin et al., 2000; Humphries et al., 2001; Maeder et al., 2001; Adams & Janata, 2002; Menon et al., 2002; Thierry et al., 2003; Beauchamp et al., 2004; Binder et al., 2004; Zatorre et al., 2004; Lewis et al., 2004). These complex natural sounds represent different categories of sound objects, as defined above. However, previous studies have not defined generic mechanisms that might enable the identification of different categories of sounds irrespective of their elementary acoustic properties or their semantic associations. The observed brain responses might be correlates of sensory or cognitive auditory object processing. The present experiment has identified two specific, anatomically distinct, early computational stages in the analysis of sound objects by the human brain: the representation of spectrotemporal fine structure in the posterior STP, and an additional stage of auditory object definition in posterior STG and STS. These two levels are of course unlikely to provide a complete description of early auditory object analysis. By analogy with multi-step models of object processing in primate visual cortices (Ungerleider & Mishkin, 1982; Felleman & Van Essen, 1991; Marcus & Van Essen, 2002; Riesenhuber & Poggio, 2002; Pameri & Gauthier, 2004), it is probable that additional stages of auditory object processing exist but were not accessed by the present experimental design. For example, an early auditory mechanism analogous to visual edge detection (Nelken, 2002; Husain et al., 2004) might provide input to the spectral shape analyser demonstrated here. Such considerations underline how little is presently understood concerning the neuronal bases for auditory object processing. Nevertheless, 125

193 the present data are consistent with a hierarchy of mechanisms for the pre-semantic analysis of complex natural sounds in the human temporal lobe. The use of generic stimuli here has shown that human STS contains a mechanism for a specific computational step (the abstraction of spectral shape independently of acoustic fine structure) that is likely to be relevant to the analysis of a number of different sound object categories. The present findings suggest that STS plays a generic role in human auditory cognition, involving the perceptual categorisation of sounds before the attribution of meaning. The present demonstration of a key role for the human posterior superior temporal lobe in the generic spectral analysis of sounds is supported by functional imaging evidence that this region is engaged in the processing of voices (Belin et al., 2000; Belin et al., 2002b; Belin & Zatorre, 2003; Von Kriegstein et al., 2003), phonetic features (Binder et al., 2000; Davis & Johnsrude, 2003; Binder et al., 2004), complex timbres (Menon et al., 2002) and environmental sounds (Engelien et al., 1995; Thierry et al., 2003; Lewis et al., 2004). The existence of a specific cortical mechanism for processing spectral shape independently of spectrotemporal fine structure is consistent with human psychophysical evidence (Dissard & Darwin, 2000; Smith et al., 2002) that these two levels of information processing can proceed independently, and that (at least for vocal signals) the information conveyed by spectral shape may be more robust than fine structure. Human MEG evidence also supports the existence of separate superior temporal mechanisms for processing the spectral envelope and spectral fine structure of vowels (Diesch & Luce, 2000). The findings in the present experiment are further reinforced by electrophysiological recordings of long-latency responses to specific phonemic features from human posterior STG and lateral temporal lobe (Creutzfeldt et al., 1989; Howard et al., 2000; Steinschneider et al., 1999). Voices and phonemes represent different categories of auditory objects defined by specific combinations of acoustic features, however the psychophysical role of phonetic features in voice recognition is well established (Fellowes et al., 1997; Sheffert et al., 2002). The present findings do not exclude the possibility that specific subregions of STS may be functionally specialised for the higher order analysis of specific sound categories, such as human voices (Belin et al., 2000; Belin & Zatorre, 2003). These functional subregions 126

194 might process subsequent stages in an anteriorly directed temporal lobe pathway that extracts increasingly abstract sound source properties (Rauschecker, 1998; Rauschecker & Tian, 2000; Scott et al., 2000; Tian et al., 2001; Alain et al., 2001). In functional imaging studies, responses to voices are generally maximal in right mid to anterior STS (Belin et al., 2000; Belin et al., 2002b; Belin & Zatorre, 2003; Von Kriegstein et al., 2003). Human lesion studies have demonstrated defects in voice discrimination following damage to either temporal lobe, a selective defect of voice recognition (phonagnosia) after damage involving the right posterior temporal lobe (Van Lancker et al., 1988) and a selective deficit in timbre perception after damage involving right mid-anterior STG and STS (Kohlmetz et al., 2003). Previous functional imaging studies have suggested a relative asymmetry of hemispheric processing for vocal and phonetic information. Specifically, it has been proposed that vocal information is preferentially processed in right anterior STS, while left anterior STS is relatively more sensitive to linguistic information (Belin et al., 2002b). Such a hemispheric asymmetry is supported by evidence for the selective processing of speech intelligibility in left anterior STS, and selective processing of stimuli with dynamic pitch variation in right anterior STG/STS (Scott et al., 2000). However, this hemispheric asymmetry in human vocal processing is relative rather than absolute (Belin et al., 2000). Evidence in non-human primates suggests that interhemispheric interactions are necessary in order to establish asymmetries for the processing of vocal information in the anterior temporal lobes (Poremba et al., 2004). The present study provides a direct demonstration that a mechanism for spectral shape extraction exists in both the left and the right human temporal lobes. However, relative hemispheric selectivity for processing particular categories of sound objects (such as voices or speech sounds) might be determined by the connections of the generic STS analyser with higher order cortices Spectral shape, timbre, and sound object identity In Chapter 5, the elementary property of pitch height was discussed in relation to timbre (Section 5.3.2, p. 101). It is pertinent to consider here the relation between perceived timbre and spectral shape changes such as those implemented in the present experiment. Timbre is a multidimensional attribute of sounds that can be analysed in both spectral and temporal domains and at different levels of abstraction (Shamma, 1999; Samson et al., 127

195 2002). The perceived timbre of a sound is integral to its recognition as a distinct sound object, as exemplified by human voices and musical instruments. Pitch height (spectral fine structure) contributes to a particular dimension of timbre that might be used to separate sound sources (for example, the quality of hollowness conferred by attenuation of even harmonics relative to odd harmonics in multi-harmonic tones). The evidence of the present study suggests that the representation of spectrotemporal fine structure may be sufficient for the detection of a change in sound object identity where changes in spectral envelope are superimposed on a constant spectral fine structure. Spectral envelope changes occurred in both the all-harmonic and the alternating conditions, however the specific temporal lobe mechanism for spectral shape computation was engaged only in the alternating conditions, where spectral fine structure is changing. Spectral shape is a major determinant of perceived timbre that helps to establish sound object identity, even under conditions where spectral fine structure is changing. Changes in spectral fine structure over time are usual in speech and many other categories of natural sounds, yet the timbre of a particular voice is perceived as invariant despite such changes. The spectral analysis mechanism identified here might help to establish either constancy or variation of vocal identity in the presence of changing spectral fine structure. Spectral shape is not the sole determinant of perceived timbre, and it is unlikely that the abstraction of spectral shape is the sole basis for the categorisation of natural sound objects. Temporal characteristics such as attack and spectral flux (Menon et al., 2002; Samson et al., 2002; Kohlmetz et al., 2003) contribute to the characteristic timbres of many musical instruments, and temporal envelope changes are therefore relevant to the identification of sound objects. The spectral and temporal dimensions of timbre can be analysed independently in multidimensional scaling experiments (Samson et al., 2002). Human psychophysical evidence (Sheffert et al., 2002) also suggests that listeners can be trained to discriminate voices using either spectral or temporal cues. The respective roles of spectral and temporal mechanisms in sound object perception have not been established in the present experiment, since temporal mechanisms were not investigated here. 128

196 6.5.4 The formation of spectral templates In both humans and non-human primates, STS contains multimodal cortex that supports the cross-modal integration of auditory source information with object information obtained in other sensory modalities (Barnes & Pandya, 1992; Baylis et al., 1987; Beauchamp et al., 2004). The efficiency of such cross-modal analysis might be improved by a mechanism such as that demonstrated here which provides an invariant representation of object features derived from a particular sensory modality. Such a representation might serve as a spectral template for more abstract levels of sound object analysis, and for the detection of changes in sound identity. A neuronal template matching algorithm has been proposed previously (Chapter 4, Section 4.5.3, p. 92) as a model for the process in which human PT computes changing spectrotemporal patterns relative to stored representations. The findings here indicate that the concept of a spectrotemporal template should be refined to incorporate representations at different levels of abstraction. Taken together with the findings of Experiment 3, the results of the present experiment suggest that, at the level of PT, template-matching algorithms operate on representations of spectrotemporal fine structure, rather than auditory object features. Properties such as pitch height represented at the level of the posterior temporal plane would serve to segregate sound sources, but would not support source identification per se. The present experiment shows that the creation of auditory object templates involves a distinct computational level that does not depend on the representation of spectral detail: here, template refers to a degraded representation that abstracts object-level features rather than encoding spectral fine structure. The spectral templates abstracted by this mechanism could allow sound sources to be identified as particular auditory objects. This is not to argue that the abstraction of spectral envelope is the only basis for the categorisation and identification of sounds. In addition to mechanisms for spectral envelope extraction investigated here, the complete analysis of sounds also includes, for example, the analysis of temporal envelope changes. It will be of considerable interest to see if the abstraction of such features has a similar neural substrate. Cognitive neuropsychological studies (Samson et al., 2002) support this possibility. 129

197 The electrophysiological mechanisms that support spectral template extraction may depend on the characteristics of individual spectrotemporal response field functions in PAC as well as emergent properties of neuronal populations in higher order auditory cortices (Shamma, 1999; Nelken, 2002). While these mechanisms have not been established in auditory cortex, neurons with response characteristics that represent elementary spectrotemporal features have been described as early as PAC (Shamma, 1999; Nelken, 2002). Analogous operations in visual object recognition suggest several candidate neural algorithms that could plausibly encode auditory object templates (Riesenhuber & Poggio, 2002; Palmeri & Gauthier, 2004). Such algorithms should support certain fundamental perceptual operations such as invariance, selectivity, representational efficiency and multi-level abstraction that are likely to be required for object recognition in all sensory systems (Riesenhuber & Poggio, 2002). The formation of a degraded representation or template might represent a stage in any of these perceptual operations, and could in principle conform to several different models of object recognition. The computation of a simplified or sparse representation based on salient features (such as edges or corners) could constitute an informationally efficient code for defining an object, however the selection of such critical features is likely to demand significant neural computational resources (Riesenhuber & Poggio, 2002; Palmeri & Gauthier, 2004). Alternatively, an object template might represent the output of computational algorithms based on a maximum ( Max ) operation that derives object features based on a peak correlation between the input signal and a matching filter function. Such models predict the development of view-invariance in broadly tuned visual object processing networks and can account for various electrophysiological observations in primate visual cortex (Riesenhuber & Poggio, 2002). Stored templates could determine the characteristics of the matching filters for object recognition, and would allow for the operation of top-down influences based on previous sensory experience. Models of this type would therefore share some formal features with generative models (Friston & Price, 2001) such as the template-matching algorithm proposed to underpin early auditory scene analysis in human PT (Chapter 4, Section 4.5.3, p. 92). However, the findings of the present experiment do not distinguish between candidate neural algorithms for the creation of spectral templates. Indeed, it is not straightforward to determine auditory analogues for perceptual operations such as viewinvariance in the visual domain. 130

198 6.5.5 A pathway for auditory object analysis? The use of generic sound objects in this experiment allowed the characterisation of a processing hierarchy where increasingly abstract features of sound objects are encoded at higher levels of the system. The abstraction of spectral shape in mid STS is likely to constitute an intermediate stage of categorisation in the processing of sound identity. This stage forms an anatomical and computational link between STP areas lateral to PAC that process elementary source properties (for example, pitch height: Experiment 3, Chapter 5), and temporal areas sited more inferiorly and anteriorly that are involved in the crossmodal and semantic processing of natural sounds such as voices (Belin et al., 2000; Belin et al., 2002b; Belin & Zatorre, 2003; Von Kriegstein et al., 2003), environmental noises (Engelien et al., 1995; Humphries et al., 2001; Maeder et al., 2001; Adams & Janata, 2002; Thierry et al., 2003; Beauchamp et al., 2004; Lewis et al., 2004) and complex timbres (Menon et al., 2002). The spectral mechanism identified in this experiment supplies a missing link in a putative auditory object processing pathway running inferiorly and anteriorly in the human temporal lobe. Hierarchical pathways of this type may represent a universal organising principle of perceptual analysis and object identification in different sensory systems (Ungerleider & Mishkin, 1982; Rauschecker, 1998; Mesulam, 1998; Marcus & Van Essen, 2002; Palmeri & Gauthier, 2004; Husain et al., 2004). Models based on such hierarchical pathways are predicated on the existence of neural networks that extract composite features related to object identification and categorisation, with a capacity for cross-modal integration of object information at the higher levels of the processing hierarchy (Rauschecker, 1998; Rauschecker & Tian, 2000; Lewis et al., 2004). The formation of cohesive multimodal representations of objects may occur as a precursor to semantic processing, in which those object representations acquire meaning. Alternatively, within-modality semantic analysis may occur prior to integration with information derived from other sensory channels. The mechanisms and neuroanatomical correlates of object feature extraction in primate visual cortex are relatively well understood, and are consistent with a visual object processing stream running anteriorly in the inferior temporal lobe (Ungerleider & Mishkin, 1982; Marcus & Van Essen, 2002; Palmeri & Gauthier, 2004). In contrast, the anatomical and computational bases for feature extraction in the cortical auditory system remain largely unknown (Husain et al., 2004; Zatorre et al., 2004). Theoretical 131

199 predictions concerning the behaviour of neural networks that might support object-level processing in auditory cortex (Husain et al., 2004) are based largely on data for the visual system (Riesenhuber & Poggio, 2002; Palmeri & Gauthier, 2004). Emerging evidence suggests that auditory object processing mechanisms may have a broadly similar anatomical and functional organisation in different primate species. In non-human primates, electrophysiological evidence supports the existence of an information-processing pathway for call sounds (Rauschecker, 1998; Wang, 2000), but does not allow clear interpretation of the computational tasks performed at each stage of the pathway. In the macaque, neurons in the posterior temporal plane (area CL), posterior belt and posterior parabelt regions demonstrate selectivity to species-specific call sounds (Rauschecker & Tian, 2000; Tian et al., 2001). This selectivity appears to be based on ensemble coding of vocal information rather than individual neuronal call-detectors, however the anatomical substrates and mechanisms for the extraction of increasingly complex call information have not been defined. Macaque calls, for example, differ in acoustic properties such as bandwidth as well as in their semantic and behavioural significance. Previous demonstrations of call selectivity in the macaque posterior superior temporal lobe do not allow these levels of processing to be disambiguated (Rauschecker & Tian, 2000; Tian et al., 2001). Polymodal responses recorded in macaque STS (Baylis et al., 1987; Barnes & Pandya, 1992) and visuo-auditory interactions involving posterior STS in human functional imaging studies (Howard et al., 1996b; Calvert et al., 1997; Downar et al., 2000; Price et al., 2003; Beauchamp et al., 2004) suggest that this region participates in the cross-modal transfer of auditory object information in both species. Cross-modal associations formed in STS could be utilised in cognitive operations such as mental imagery, semantic processing, working memory and decision-making, mediated by ventral temporal and prefrontal areas (Beauchamp et al., 2004; Husain et al., 2004; Binder et al., 2004). In the macaque, rostral and caudal architectonic sectors along the length of STS are linked by a sequence of intrinsic reciprocal cortico-cortical connections (Seltzer & Pandya, 1989). Similar connections in human STS might provide a substrate for the rostro-caudal transfer of auditory object information. The macaque left temporal pole responds preferentially to species-specific vocalisations over other classes of complex sounds, including scrambled vocalisations (Poremba et al., 2004), while the human anterior temporal lobe exhibits increasing selectivity for the analysis of vocal and speech sound attributes (Scott et al., 2000; Belin et al., 2002b; Belin & Zatorre, 2003). 132

200 Such observations suggest that specificity for more abstract and ethologically relevant attributes of natural sound objects emerges at higher levels of an anteriorly directed temporal lobe processing pathway in the primate brain. It has previously been argued (Experiment 3, Chapter 5) that the network that processes auditory object ( What ) information in the human brain is distributed along the anteroposterior axis of the superior temporal lobe. The present data on the analysis of spectrotemporal fine structure, the findings of Experiment 3 concerning the processing of pitch height (Chapter 5) and extensive previous data relating to spectrotemporal properties such as AM and FM (Hall et al., 2002; Hart et al., 2003b) together implicate the posterior STP in the representation of elementary features that might serve to distinguish sound objects. Posterior STS is activated by environmental sounds relative to broadband ripple sounds (Beauchamp et al., 2004), while right anterior STS is sensitive to auditory object features (Zatorre et al., 2004). These observations predict the existence of intermediate stages of object-level feature extraction before the attribution of meaning. The present experiment confirms this prediction by defining one mechanism that might operate at such an intermediate stage of auditory object identification before recognition occurs. Semantic categorisation, multimodal integration and decision tasks involving auditory objects engage widely distributed bilateral anterior superior temporal, inferior temporal and inferior frontal networks (Adams & Janata, 2002; Thierry et al., 2003; Binder et al., 2004) that might make use of object tokens generated by the temporal lobe mechanism defined here. Taken together, these functional imaging studies are broadly consistent with a ventrally directed What pathway for processing sound object features in the human auditory brain. However, neither previous studies nor the present experiment establish the precise sequence of activation or cortico-cortical connectivities within the distributed network of areas engaged in auditory object analysis. The direction (or indeed, the extent) of auditory object information transfer between HG, STP, and STS cannot be resolved using the present data (see for example Figure 6.3), however a single unidirectional stream would seem unlikely a priori. Indeed, the findings of Experiment 3 (see Chapter 5) suggest that What information relating to sound object identity and source-independent What information conveyed by sound sources are analysed by distinct processing mechanisms located posterior and anterior to HG. Examples of such a divergence of 133

201 auditory What information could occur in the analysis of music for instrument or melody, or in the analysis of speech for speaker identity or verbal content. Certain other observations would be consistent with such an anatomical and functional partition for the processing of auditory What information in the human temporal lobe. For example, functional imaging evidence suggests that the early processing of phonemes may engage auditory areas anterior to those implicated in environmental sound identification (Thierry et al., 2003; Binder et al., 2004). Based on the available evidence, however, the basis for any such antero-posterior separation of object-level feature processing in human auditory cortex must be considered speculative. 134

202 Chapter 7. CONCLUSIONS Summary The work presented in this thesis has addressed generic mechanisms of spectrotemporal pattern analysis in human auditory cortex. Spectrotemporal patterns convey information about the identity and the location of sound sources and the evolution of those sources in time and space: the brain mechanisms that extract this information support auditory scene analysis. Here, these mechanisms have been shown to engage distinct auditory cortical areas beyond primary auditory cortex in the human brain. The present experimental evidence implicates anatomically specific cortical networks in the analysis of different types of spectrotemporal patterns. Cortical areas posterior to primary auditory cortex segregate sound sources based on source-specific information such as spatial location and pitch height, while areas anterior to primary auditory cortex track source-independent information streams. This work further suggests that the analysis of auditory object properties is hierarchically organised. Posterior superior temporal areas represent fine spectrotemporal structure, while superior temporal sulcus is engaged in the abstraction of object features that might support sound identification. Within this organisational scheme, the planum temporale behaves as a computational hub that disambiguates different types of spectrotemporal information and gates that information to higher-order cortical areas. The mechanisms of auditory pattern analysis proposed here support the existence of functional homologies between the auditory cortex of humans and non-human primates. This concluding chapter draws together the main findings of the thesis, and evaluates these findings in relation to previous work on auditory cortical pattern analysis. The key conclusions of the thesis are presented as a series of assertions that await further validation and qualification in future experiments. Possible directions for such future work are outlined. 135

203 7.1 Human auditory cortex contains generic mechanisms for the analysis of patterned sound The fmri experiments described in this thesis address cortical mechanisms for processing patterns in sound, before those patterns become associated with meaning. This level of auditory information processing has been relatively little studied. Animal electrophysiology has focussed on the encoding of the acoustic signal in the ascending auditory pathways to the level of PAC (Nelken et al., 1999; Nelken, 2002), while much previous functional imaging work in humans has sought either to corroborate the fundamental organisational principles suggested by animal electrophysiology (Bilecen et al., 1998; Talavage et al., 2000) or to establish the neuroanatomical correlates of such distinctively human capacities as speech or music (Zatorre et al., 1992, 1994, 1996; Scott & Johnsrude, 2003). The experiments described here were designed to establish the human brain mechanisms that solve certain fundamental problems of spectrotemporal pattern analysis, namely: the representation of the spatial location of sound sources and the representation of non-spatial information associated with those sources; the disambiguation of dynamic spectrotemporal patterns associated with sound movement from other complex spectrotemporal patterns; the representation of source-dependent and source-independent pitch information; and the identification of sound objects. These problems relate to generic aspects of auditory scene analysis (Bregman, 1990), and must be solved for all classes of natural sounds. The present findings demonstrate that auditory cortical mechanisms to support such generic aspects of spectrotemporal analysis can be identified in the human brain. The precise functional role of auditory cortex in humans and other species has been difficult to establish (Talwar et al., 2001). Considerable processing and transformation of acoustic information occurs in the ascending auditory pathways (Miller et al., 2001), while the effects of auditory cortical lesions in non-human primates are frequently transient (Heffner & Heffner, 1990a). The present work suggests that auditory cortical mechanisms play fundamental roles in auditory scene analysis. This level of auditory information processing is critical for subsequent recognition and cross-modal integration of sounds and for navigation in auditory space. These mechanisms have specific anatomical substrates involving cortical areas beyond PAC, and establish anatomical and 136

204 functional links between the ascending auditory pathways that encode the acoustic signal and higher cortical areas engaged in the cognitive processing of sounds. The present experiments have not established a specific role for PAC in spectrotemporal pattern analysis. Taken together, the findings suggest that PAC is comparably engaged in processing many types of sound pattern, and that functional differentiation for the extraction of particular sound properties emerges at the level of non-primary auditory cortex. This interpretation would be consistent with anatomical and electrophysiological evidence for a structurally and functionally differentiated hierarchy of auditory cortical areas across mammalian species (Read et al., 2002; Semple & Scott, 2003). However, failure to demonstrate differential activation of medial HG using functional imaging techniques should be interpreted cautiously. The intrinsic variability of PAC location between individuals means that the anatomical correspondence between PAC and macroscopic landmarks in medial HG is imprecise (Morosan et al., 2001; Rademacher et al., 1993, 2001). More fundamentally, functional brain imaging techniques such as fmri measure activity in anatomically contiguous neuronal ensembles: these techniques might not detect functional differentiation at the level of distributed subpopulations of neurons within a cortical region such as PAC. Such subpopulations might communicate with higher order cortices specifically engaged in the analysis of particular sound properties. Indeed, electrophysiological evidence in other mammalian species suggests that such functional differentiation does exist within PAC for the processing of spatial sound properties (Ahissar et al., 1992; Toronchuk et al., 1992). 7.2 Distinct cortical networks analyse spatial and non-spatial spectrotemporal patterns Perhaps the most fundamental problem of auditory scene analysis is the disambiguation of spectrotemporal patterns intrinsically associated with sound sources from patterns due to the locations of those sources in space. The cortical mechanisms that solve this problem are central to the current What Where controversy in auditory neuroscience. This issue holds general implications for the existence of organising principles across different sensory systems and anatomical homologies across species. Models of primate auditory cortical organisation, motivated by analogies with visual 137

205 neuroscience, have posited the existence of a dorsal stream for processing auditory spatial properties and a ventral stream for processing object properties. Anatomical (Galaburda & Sanides, 1980; Rivier & Clarke, 1997; Galuske et al., 1999; Recanzone, 2000b, Recanzone et al., 2000b; Kaas & Hackett, 2000; Rauschecker & Tian, 2000; Tardif & Clarke, 2001) and electrophysiological (Tian et al., 2001; Alain et al., 2001; Anourova et al., 2001) data in both humans and non-human primates and lesion (Clarke et al., 2000) and functional imaging (Alain et al., 2001; Maeder et al., 2001; Hart et al., 2004) data in humans have been adduced in support of this scheme. However, the existence of anatomically specific What and Where processing mechanisms remains contentious (Middlebrooks, 2002; Cohen et al., 2004): no account has satisfactorily reconciled the evidence, on the one hand, for a duality of processing streams and, on the other hand, for their mutual interdependence (Middlebrooks, 2002; Zatorre et al., 2002b). Here, virtual acoustic space techniques were exploited in order to examine cortical mechanisms that process static sound source location (Experiment 1) and sound source motion (Experiment 2) in the intact human brain. These mechanisms were directly compared with mechanisms for processing non-spatial spectrotemporal patterns. Together these experiments have demonstrated distinct human auditory cortical mechanisms beyond PAC (medial HG) that are simultaneously and specifically engaged in processing spatial and non-spatial properties of sound sequences. Specific mechanisms for processing auditory spatial properties have been identified in a bilateral posterior superior temporal network extending from medial PT into PTO and IPL. These mechanisms are distinct from those processing patterns based on non-spatial sound properties such as pitch sequences, which engage a bilateral antero-lateral network involving lateral HG, anterior PT, PP and STG. Experiments 1 and 2 demonstrate that the posterior cortical network processes spatial properties of sound sources rather than spectrotemporal complexity per se. These findings are consistent with a posteriorly directed pathway comprising PT, PTO and IPL, responsible for the perceptual processing of sound source location and motion in space. The demonstrated involvement of IPL (PTO) in the processing of sound source motion here supports previous human functional imaging studies that have consistently implicated this region in auditory spatial analysis (Table 3.1). In addition, however, both Experiments 1 and 2 demonstrate functional differentiation for the processing of spatial 138

206 and non-spatial sound source properties as early as human PT. Within PT, spectrotemporal patterns associated with spatial location are analysed postero-medially, while pitch pattern is analysed antero-laterally (Experiment 1); and dynamic spectrotemporal patterns associated with sound motion are analysed postero-medially, while comparably complex non-spatial patterns are analysed antero-laterally (Experiment 2). The existence of distinct functional subregions within a large cortical region such as human PT is plausible on anatomical (Wallace et al., 2002b) and electrophysiological (Liégeois-Chauvel et al., 1991, 1994; Howard et al., 2000; Yvert et al., 2001) grounds, and consistent with previous human functional imaging evidence implicating PT in the analysis of sounds with complex spatial (Baumgart et al., 1999; Hart et al., 2004; Table 4.3) and non-spatial (Zatorre et al., 1992; Binder et al., 1996; Griffiths et al., 1998a; Griffiths et al., 1999a; Mummery et al., 1999; Binder et al., 2000; Giraud et al., 2000; Thivard et al., 2000; Table 4.3) spectrotemporal properties. Furthermore, such a functional subdivision suggests homologies with the macaque posterior STP, in which relative selectivity for spatial and non-spatial sound source properties has been described in medial (CM) and lateral (CL) caudal belt fields, respectively (Tian et al., 2001). The present evidence suggests that human PT may play a crucial role in gating auditory spatial and non-spatial information between functionally differentiated networks involving distinct higher-order cortical areas. There is some electrophysiological evidence indicating that static sound source location and sound source motion may be processed by anatomically distinct human brain mechanisms (Ducommun et al., 2002). The present experiments provide evidence for a common posterior cortical substrate, but do not exclude the involvement of additional brain regions that might be functionally differentiated for the specific perceptual processing of sound source motion (analogous to the visual MT/V5 complex). Experiment 2 was not designed to discriminate between alternative processing mechanisms that might represent sound source motion (for example, mechanisms based on the analysis of velocity or snapshots of serial locations: Middlebrooks & Green, 1991), however the direct comparison between moving and stationary externalised sound sources in this experiment suggests that the processing of sound motion demands computational resources additional to those involved in processing static sound sources. 139

207 The present evidence for the existence of distinct anterior and posterior networks for processing auditory What and Where information in human auditory cortex should be interpreted cautiously. Animal electrophysiology (Rauschecker & Tian, 2000; Tian et al., 2001) has shown that any such separation of processing is likely to be relative rather than absolute. Psychoacoustic (Brown et al., 1980), lesion (Zatorre & Penhune, 2001), electrophysiological (Graziano et al., 1999; Tian et al., 2001; Cohen et al., 2004) and functional imaging (Zatorre et al., 2002b) studies in humans and other mammalian species have demonstrated the potential for interactions between spatial and nonspatial auditory information. Factors that might influence such interactions (for example, particular intrinsic spectrotemporal object properties or behavioural context) were not examined in the present experiments. The extent of engagement of anterior and posterior networks is task-dependent and modulated by attention. In particular, the variable activation of frontal and superior parietal areas in previous human functional imaging studies (see Table 3.1) suggests that these areas constitute a supramodal attentional network that may be engaged by both spatial and non-spatial auditory stimulus attributes, depending on task demands. The absence of activation in this network in Experiments 1 and 2 here is likely to reflect the absence of any active discrimination task. A related qualification concerns the extent to which the antero-lateral and posterior cortical networks demonstrated here can be considered true processing streams analogous to those proposed in the macaque. It is not possible to determine the spatio-temporal flow of auditory information within each network using functional imaging techniques in isolation, however a simple unidirectional flow of information anteriorly and posteriorly from PAC would appear unlikely on anatomical (Pandya, 1995; Hackett et al., 1998a,b) and electrophysiological (Kitzes & Hollrigel, 1996; Kaas et al., 1999; Kaas & Hackett, 2000; Howard et al., 2000) grounds. The present findings do not exclude the possibility, raised by macaque work (Rauschecker & Tian, 2000), that there are multiple parallel inputs to the cortical networks from PAC and thalamus. The stimulus properties processed in antero-lateral and posterior auditory cortical networks are unlikely to be completely defined in terms of a simple What Where dichotomy. Experiments 1 and 2 have shown that processing in postero-medial PT is specific for spatial versus dynamic non-spatial spectrotemporal patterns. However, spatial analysis cannot be considered the exclusive role of this region: for example, it is also activated by low-frequency FM sweeps (Schönwiesner et al., 2002) and during speech 140

208 production (Wise et al., 2001). As in the macaque, the network subserving What information processing has here been shown to be distributed along the antero-posterior temporal axis. This suggests a priori that different components of the network are likely to be functionally differentiated for processing different kinds of What information. Indeed, in humans, the antero-lateral network has been implicated in the analysis ( What ) of many different types of spectrotemporal pattern (see Table 3.1), including simple spectral and temporal patterns (Griffiths et al., 1998a, 1999b; Binder et al., 2000; Thivard et al., 2000; Zatorre & Belin, 2001; Hall et al., 2002; Patterson et al., 2002), musical melodies (Zatorre et al., 1994, 1996), vocal sounds (Belin et al., 2000), and speech (Zatorre et al., 1992; Scott et al., 2000; Vouloumanos et al., 2001; Wise et al., 2001). 7.3 Distinct cortical networks analyse sourcedependent and source-independent pitch properties Auditory What information can be broadly categorised as information about particular sound sources and information patterns that do not depend on any particular source. These source-dependent and source-independent types of What information are exemplified in the height and chroma dimensions of the musical pitch helix. Pitch height corresponds to the octave on the keyboard and provides a basis for distinguishing different voices or instruments irrespective of the melody they produce. Conversely, a particular pattern of pitch chroma values within the octave (a melody) can be identified independently of the specific voice or instrument producing it. Despite longstanding interest in pitch perception (Helmholtz, 1875) and an extensive body of data derived from both animal electrophysiology and human psychophysics, brain mechanisms that might separate the height and chroma dimensions of pitch have been studied infrequently (Patterson, 1990; Patterson et al., 1993). In particular, previous human functional imaging studies have not manipulated the two dimensions independently. Here (Experiment 3), the human brain mechanisms that represent pitch height and pitch chroma have been investigated using stimuli in which pitch height could be manipulated independently of chroma. Pitch height can be ordered from low to high along a continuous dimension when chroma is kept constant, demonstrating that height and 141

209 chroma represent distinct perceptual pitch dimensions. These dimensions have distinct cortical representations: pitch height is specifically represented posterior to HG within PT, while pitch chroma is specifically represented anterior to HG in PP. The findings support a functional differentiation within human PT for the analysis of pitch height and pitch chroma information. The analysis of pitch height specifically activates posterior PT, whereas both pitch chroma and pitch height information is analysed in antero-lateral PT. The specific brain representations of the two pitch dimensions occur at a logical point in the recently proposed cortical pitch processing hierarchy (Griffiths et al., 1998a, Patterson et al., 2002): medial HG (PAC) is similarly active in processing noise or pitch, lateral HG processes pitch (both height and chroma) preferentially over broadband noise, and the pitch dimensions of height and chroma (corresponding to different classes of relations between pitches) are processed by specific higher-order auditory areas beyond HG. The analysis of pitch variation in sound sequences extending over seconds is required to process melodies in music and prosody in speech. These patterns have been shown in previous studies to engage bilateral cortical areas extending from antero-lateral PT and lateral HG into PP, STG and STS (Zatorre et al., 1994; Griffiths et al., 1998a; Scott et al., 2000; Patterson et al., 2002; Mitchell et al., 2003; Scott & Johnsrude, 2003), congruent with the mechanism for chroma analysis identified here. This mechanism may therefore play a generic role in tracking coherent information streams that can be analysed independently of the specific sound source. While specific mechanisms for pitch height analysis have not been addressed previously, pitch height constitutes a basis for the segregation of sound sources. It is therefore plausible to propose that the posterior cortical mechanism identified in Experiment 3 is involved in sound source segregation, an early stage of auditory scene analysis. Experiments 1, 2 and 3 together suggest that generic mechanisms for early auditory scene analysis are instantiated in posterior auditory cortical areas in the human brain. These mechanisms analyse both spatial (static location, motion) and non-spatial (pitch height) attributes that can be used to segregate sound sources. All experiments implicate PT in the disambiguation of source-dependent and source-independent information. Within PT, areas processing source-dependent information lie posterior to those processing sourceindependent information. The present experiments further suggest that different classes of source-dependent information engage specific sub-regions of posterior PT. Auditory spatial analysis (Experiments 1 and 2) specifically engages postero-medial PT, while 142

210 pitch height analysis (Experiment 3) engages areas sited more laterally within posterior PT (cf. Figures 3.2, 4.1, 5.3, 5.4; Tables 3.2, 4.2, 5.1, 5.2). Any such anatomical scheme must, however, be considered tentative, since pitch height and auditory spatial analysis have not been compared directly. While neuroanatomical correlates of pitch height and pitch chroma processing have not been established in non-human primates, the scheme proposed here would be consistent with electrophysiological evidence in the macaque (Tian et al., 2001) implicating caudal belt fields in the analysis of both spatial and call sound information. Limited evidence (Wright et al., 2000; Brosch et al., 2004) further suggests that it may be possible to identify mechanisms for processing sourceindependent pitch information (analogous to the pitch chroma mechanism identified here in the human brain) in non-human primates. The present evidence allows more specific conclusions to be drawn regarding the fractionation of the auditory cortical substrate for processing What information, which is distributed along the antero-posterior axis of the superior temporal lobe in both humans and non-human primates (Rauschecker, 1998; Rauschecker & Tian, 2000; Binder et al., 2000; Belin et al., 2000; Patterson et al., 2002; Scott & Johnsrude, 2003). The data support a general scheme in which source-dependent What information is analysed by posteriorly sited mechanisms, while source-independent What information is analysed by antero-lateral mechanisms. The absence of an active discrimination task here implies that these mechanisms are engaged in the obligatory perceptual analysis of pitch chroma and pitch height information. The present data do not establish a mechanism for the integration of different classes of What information to produce a unified behavioural response, nor do they indicate whether processing is modulated by behaviourally relevant stimuli. While pitch height represents a basis for the segregation of sound sources, the identification of natural sounds (whether human voices, musical instruments or macaque vocalisations) depends on additional mechanisms for the extraction of object-level features. 143

211 7.4 Hierarchical cortical mechanisms analyse auditory object properties Hierarchical processing pathways may represent a universal organising principle for the perceptual analysis and identification of objects in different sensory systems (Ungerleider & Mishkin, 1982; Mesulam, 1998; Palmeri & Gauthier, 2004). Such hierarchical pathways are likely to be based on neuronal networks that extract composite features related to object identification and categorisation, with the formation of increasingly abstract object representations that undergo cross-modal integration and semantic processing at higher levels of the hierarchy (Mesulam, 1998; Riesenhuber & Poggio, 2002; Palmeri & Gauthier, 2004). Substantial anatomical and electrophysiological evidence (Felleman & Van Essen, 1991; Marcus & Van Essen, 2002; Riesenhuber & Poggio, 2002; Palmeri & Gauthier, 2004) supports the hierarchically organised processing of object information in primate visual cortex. The definition of an auditory object is not straightforward: here, a general operational definition has been preferred, according to which an auditory object is a combination of spectrotemporal features that establishes a perceptual boundary between a particular sound and the acoustic background (see Chapter 6, Section 6.1, p. 114). According to such a definition, an auditory object might be either a sound source or an acoustic event, such as a vowel or a call sound. In the primate auditory system, electrophysiological evidence supports the existence of an information-processing pathway for call sounds in non-human primates (Rauschecker & Tian, 2000; Wang, 2000; Tian et al., 2001) and for phonetic features in the human lateral temporal lobe (Creutzfeldt et al., 1989; Steinschneider et al., 1999). The existence of such a pathway is further supported by lesion studies of voice (Van Lancker et al., 1988) and timbre (Samson et al., 2002; Kohlmetz et al., 2003) recognition in humans. However, the brain mechanisms responsible for the abstraction of object-level features in auditory cortex have not been characterised. Such a mechanism has been defined here (Experiment 4) using generic auditory objects in which spectral shape (spectral envelope) was manipulated independently of spectrotemporal fine structure. The stimuli shared spectral features with natural sounds but were not perceived as corresponding to any familiar sound. These stimuli were designed to engage obligatory perceptual mechanisms of auditory object analysis before the attribution of meaning. Experiment 4 provides evidence for at least two distinct levels 144

212 in the spectral analysis of sound objects. The first level of analysis involves the representation of spectrotemporal fine structure, and engages HG, PT and posterior STG. The second level of analysis involves the computation of spectral shape independently of spectral fine structure, and engages a distinct, rightward-asymmetric network extending antero-laterally from PT through posterior STG and along STS. This temporal lobe mechanism would allow the abstraction of spectral templates as a critical stage of auditory object analysis before cross-modal and semantic processing. The mechanism links areas in the posterior STP engaged in sound source segregation (see Chapter 5) with more anterior and inferior areas in the temporal lobe engaged in processing different categories of natural sounds, including voices (Belin et al., 2000, 2002b; Belin & Zatorre, 2003; Von Kriegstein et al., 2003), environmental noises (Engelien et al., 1995; Humphries et al., 2001; Maeder et al., 2001; Adams & Janata, 2002; Thierry et al., 2003; Beauchamp et al., 2004; Zatorre et al., 2004; Lewis et al., 2004) and complex timbres (Menon et al., 2002). The mechanism identified here provides a missing link in a putative anteriorly directed auditory object processing pathway in the human brain. The findings support a hierarchical framework that unites human functional imaging data on early auditory scene analysis in the posterior temporal plane (Chapters 3, 4 and 5) and the cognitive processing of natural sounds in STS and beyond (Thierry et al., 2003; Binder et al., 2004; Beauchamp et al., 2004). A mechanism for the computation of object templates might underpin the sensitivity to pre-semantic sound object features observed previously in human anterior STS (Zatorre et al., 2004). Observations in both macaques (Baylis et al., 1987; Barnes & Pandya, 1992) and humans (Beauchamp et al., 2004) implicate STS in cross-modal operations that might make use of object templates such as those proposed here. Such templates could participate in semantic categorisation and decision processing in distributed anterior temporal and inferior frontal lobe networks (Adams & Janata, 2002; Thierry et al., 2003; Binder et al., 2004). It is tempting to conclude that the posterior temporal plane mechanisms for the representation of spectrotemporal fine structure demonstrated in Experiments 3 and 4 and the temporal lobe mechanism for the abstraction of object-level properties together constitute a true, ventrally directed What pathway for the initial segregation and subsequent identification of auditory objects. A hierarchical anteriorly directed pathway 145

213 for the analysis of increasingly abstract properties of natural sounds in the human temporal lobe is consistent with functional imaging data for the processing of speech sounds (Scott et al., 2000) and voices (Belin et al., 2002b; Belin & Zatorre, 2003). Human functional imaging evidence further supports hemispheric selectivity for the processing of voices and phonetic features at the level of anterior STS (Belin et al., 2000; Belin et al., 2002b; Belin & Zatorre, 2003; Von Kriegstein et al., 2003). Such hemispheric selectivity might be determined by the connections of the generic STS template analyser with higher order cortices. However, neither previous functional imaging studies nor Experiment 4 here establish the precise sequence of activation or cortico-cortical connectivities within the distributed network of areas engaged in auditory object analysis. The findings of Experiment 3 suggest that the differentiation of sourcedependent and source-independent What information may be a basic organising principle of the distributed auditory What network, and that this dichotomy may be supported by processing mechanisms sited relatively posterior and anterior to PAC. Experiment 3 demonstrates such an antero-posterior separation at the level of early auditory scene analysis. An analogous separation may be preserved in the early analysis of auditory object-level features (for example, vocal identity versus the verbal content of the speech stream), however this remains speculative. The present findings do not establish whether anatomical and functional differentiation for source-dependent and source-independent information exists at successive stages in the processing of natural sounds. By analogy with visual object processing (Palmeri & Gauthier, 2004), the analysis of sound objects prior to recognition is likely to involve many subprocesses within or in addition to the two broad levels proposed here. Mechanisms that represent elementary features such as spectral edges or corners (analogous to those in early visual cortices) would provide input to the spectral shape analyser. The elucidation of such mechanisms might enable candidate neural algorithms for spectral shape extraction (such as sparse coding or Max-like operations: Riesenhuber & Poggio, 2002) to be distinguished. Operational analogies between visual and object processing must, of course, be interpreted cautiously, since the computational problems in each modality are quite different (see Chapter 1, Section 1.1, p. 3). Moreover, the formation of a spectral template is unlikely to represent the sole basis for sound object perception. Temporal envelope changes, for example, are also likely to be highly relevant to the identification of sound 146

214 objects, but were not investigated here. The interaction between temporal and spectral features is also likely to be important in the perception of complex natural timbres such as musical instruments (Samson et al., 2002). The spectral template mechanism described here would, however, detect changes in sound object identity where spectral fine structure is changing over time, as is the case for speech and many other natural sounds. 7.5 The planum temporale is a computational hub for the analysis of spectrotemporal patterns The experiments described in this thesis demonstrate a common involvement of human PT in the disambiguation of a variety of spectrotemporal patterns: pitch from spatial location (Experiment 1), sound motion from dynamic non-spatial spectrotemporal variation (Experiment 2), pitch chroma from pitch height (Experiment 3), and object level properties (spectral shape) from spectrotemporal fine structure (Experiment 4). In each case, a broad postero-medial antero-lateral functional subdivision of PT can be defined based on a segregation of source-dependent (spatial and non-spatial) properties and source-independent properties, respectively. Spatial ( Where ) information about sound sources is processed in a well-defined region of the posterior STP (see Figures 3.2 and 4.1), whereas the areas that process non-spatial ( What ) information are distributed along the antero-posterior axis of the superior temporal lobe (see Figure 3.2, 4.1, 5.3 and 6.3). Auditory What information is further segregated into source-dependent and source-independent information, processed by distinct cortical mechanisms located relatively postero-laterally and antero-laterally within PT (see Figure 5.3). Furthermore, the functional subdivisions of PT participate in distinct, distributed antero-lateral and postero-medial cortical networks (see Figures 3.2, 4.1 and 6.3). Taken together, the findings suggest that human PT contains generic mechanisms for the segregation of sound patterns and behaves as a computational hub that gates these patterns to distinct higher order areas mediating the perception of sound source location, source identity and source-independent information streams. The involvement of human PT in the integration of cross-modal information (Howard et al., 1996b; Calvert et al., 1997; Downar et al., 2000; Price et al., 2003) provides further evidence that this region is sited at the hub of a widely distributed network of cortico-cortical connections. 147

215 While it is likely a priori that an extensive cortical region such as human PT will contain multiple anatomically and functionally differentiated subregions (Galaburda & Sanides, 1980; Leinonen et al., 1980; Kaas & Hackett, 2000; Wallace et al., 2002b), the functional architecture of the human STP has not been established (Zatorre et al., 1998; Marshall, 2000; Recanzone, 2002). Human PT has traditionally been regarded as a language processor, yet much previous functional imaging evidence has implicated PT in the processing of diverse types of sounds (Figure 4.2 and Table 4.3). Taken together, the present experiments provide a framework for this data and suggest a testable model of human PT as a computational engine for the segregation of spectrotemporal patterns. PT is engaged by sounds that have a complex spectrotemporal structure (Table 4.3). Such sounds, comprising a number of component frequencies that change over time, are common in nature, and present the brain with a generic computational problem: the extraction of different classes of auditory information embedded in the auditory scene. This problem is exemplified by the analysis of a single speaker or a musical instrument heard in free field. The solution to the problem demands the segregation of spectrotemporal patterns intrinsic to the sound object (identity of voice or instrument) from the spectrotemporal effect of its location in space, and from the source-independent information stream (speech or melody) that it conveys. Although mechanisms for the accurate representation of incoming acoustic spectrotemporal signals exist in the ascending auditory pathways up to and including PAC (De Charms et al., 1998; Nelken et al., 1999), it is unlikely that this system is sufficient to complete auditory scene analysis. The present work has suggested that human PT does possess the necessary computational resources to disambiguate different types of spectrotemporal pattern as they are derived from PAC and subcortical structures, and to resolve these patterns into different types of auditory information that are used by distinct higher-order cortical mechanisms (Figure 4.2). In general, of course, the problem solved by PT will be much more complex than outlined here, since most natural auditory scenes are composed of multiple sound sources overlapping in time and space (Bregman, 1990). In terms of information theory, this problem could be conceptualised as the transfer of maximum information between multiple auditory sources and their neuronal representations. Information about sound sources forms a convolutive mixture : each sound source is convolved with filter functions imposed by the external ears (Wightman & Kistler, 1998) and at processing stages in the ascending auditory pathways. 148

216 Information about filter and source characteristics acquired through experience of the auditory world (Hofman et al., 1998) might be stored locally in PT, or in higher order cortical areas. These stored characteristics might constitute templates against which incoming patterns could be matched: examples could include the HRTF, phonemic features or spectral shapes (see Experiment 4). The deconvolution algorithm might then be driven by minimisation of the error signal between the incoming pattern and the template. After deconvolution in PT, auditory information about spatial and object properties would be used to update stored templates. In performing its computations, PT would therefore both access learned representations in higher order cortical areas and gate auditory information to those higher areas. Such reciprocal top-down and bottomup processing is likely to be an essential feature of PT computation, since (in terms of information theory) it would ensure that the deconvolution algorithm is not blind (Stone, 2002). Accordingly, this could be classified as a generative model (Friston & Price, 2001). The present work does not provide grounds for specifying the neural algorithms that might support PT computation; indeed, various neural network configurations might achieve the same computational result. However, these algorithms might share certain formal features with independent component analysis, as modelled in the analysis of convolutive mixtures by artificial neural networks (Bell & Sejnowski, 1995; Attias & Schreiner, 1998; Stone, 2002). This family of algorithms was developed under constraints similar to those confronting the auditory brain. The present fmri data support previous electrophysiological observations in animals (Middlebrooks et al., 1994) in suggesting that computational mechanisms for auditory scene analysis are instantiated at the level of neuronal ensembles rather than single neurons. More speculatively, a template-based mechanism that is used to determine the characteristics of a neural filter function might also support auditory object identification (Risenhuber & Poggio, 2002; see Chapter 6, Section 6.5.4, p. 130). It might therefore be hypothesised that the computational mechanisms operating at two levels of auditory object analysis (sound source segregation in PT and object identification in STS) share certain formal features. The functional analogy could be extended to invoke a higher-order computational hub for auditory and cross-modal object-level properties in STS (Beauchamp et al., 2004), operating in series with the auditory scene analyser in PT. 149

217 7.6 Functional homologies can be established in the auditory cortex of humans and other primates This work supports the existence of anatomical and functional homologies in the auditory cortex of humans and non-human primates (specifically, the macaque). At the level of the posterior STP, functional selectivity for spatial and non-spatial sound properties can be defined in postero-medial and antero-lateral human PT and in medial and lateral macaque caudal belt (Rauschecker, 1998; Rauschecker & Tian, 2000; Tian et al., 2001), respectively. In both humans and macaques, the posterior superior temporal subregions are associated with distinct fronto-parietal and antero-lateral temporal cortical networks. This evidence supports a What Where functional dichotomy in primate auditory cortical organisation (Rauschecker, 1998; Kaas & Hackett, 2000), in which distinct ventral and dorsal pathways arising in the posterior temporal plane process object and spatial properties of complex sounds. Within the auditory What processing network of both humans and macaques, a functional hierarchy is likely to mediate the transfer of information from the posterior temporal plane to areas sited more laterally and inferiorly in the temporal lobe. Specific mechanisms for abstracting spectral shape (an object-level property) have here been demonstrated in the human posterior superior temporal lobe, consistent with the increasingly selective responses to sound objects (vocalisations) observed in passing from core to belt and parabelt in the macaque (Rauschecker et al., 1995; Kaas & Hackett, 2000; Rauschecker & Tian, 2000; Tian et al., 2001). The existence of an object-level analyser in human STS is also consistent with the demonstrated role of this region in the cross-modal integration of object information in both humans (Beauchamp et al., 2004) and macaques (Baylis et al., 1987). At present, the inferences that can be drawn concerning the nature and the extent of homology in auditory cortical organisation between primate species remain limited. These limitations reflect both technical and physiological factors. Technically, cortical regions such as the macaque medial belt and the human STP remain difficult to study electrophysiologically. It is therefore problematic to establish fine-grained homologies within cortical regions (for example, between macaque CM and CL, human Te2 and Te3 [Morosan et al., 2001], and the postero-medial and antero-lateral PT functional subregions identified here). A broader technical issue relates to the difficulty of correlating results obtained in different species using different modalities (macaque 150

218 electrophysiology and human functional imaging) with very different temporal and spatial resolutions. Physiologically, difficulties arise in determining which auditory processing tasks should be regarded as functionally equivalent between species. For example, primate call sounds, like human vocal sounds, can be analysed at different levels of perceptual and semantic complexity and it is generally difficult to specify the neurophysiological correlates of a particular level of processing. Some classes of auditory pattern processing (such as the analysis of source-independent information streams) have been little studied in non-human primates. The modulation of auditory cortical mechanisms by behaviour has been examined relatively infrequently in non-human primates and remains to be examined systematically in humans. It is widely assumed that the macaque is an appropriate (perhaps the most appropriate) animal model of human auditory cortical organisation (see Chapter 1, Section 1.2.1, p. 4). Nevertheless, studies in a range of mammalian species are required to substantiate the existence of general organising principles of auditory cortex function that hold across species. Emerging evidence offers tentative support for such homologies among mammalian species (Stecker et al., 2003). 7.7 Directions for further work The findings of the experiments described in this thesis suggest several directions for further work. The existence of generic mechanisms for the analysis of certain types of spectrotemporal pattern (spatial location, sound motion, pitch height and chroma, spectral shape) in human auditory cortex suggests that similar mechanisms may be identifiable in nonhuman primates and other mammalian species. Such generic mechanisms might allow auditory cortical homologies to be established across species. A complete description of such homologies would require the identification of mechanisms underpinning other generic aspects of auditory scene analysis. For example, the mechanisms involved in temporal sound pattern analysis (such as anisochrony and temporal envelope) have not been addressed in the present work and have been relatively neglected in previous human functional imaging studies. The elucidation of cross-species homologies is likely to depend on combined modality studies to establish the electrophysiological correlates of the metabolic changes detected on functional imaging: initial studies in primate visual 151

219 cortex (Logothetis et al., 2001; Logothetis, 2002, 2003) have established the value of such a combined approach. Studies combining the temporal resolution of electrophysiology and the spatial resolution of fmri have the potential to resolve the issue of functional connectivities and sequential activation within auditory processing streams (Alain et al., 2001; Liebenthal et al., 2003b). The generic mechanisms identified here are likely to support obligatory perceptual processes involved in early auditory scene analysis and the identification of auditory objects. This is an appropriate starting point from which to define elementary principles of auditory cortical operation. However, this level of analysis could be extended logically to the identification of subsequent cross-modal and semantic stages in the object processing pathway, the elucidation of top-down modulatory influences on early auditory cortical areas, and the exploration of integrative mechanisms that link different processing pathways to produce a unified behavioural response. Aside from the intrinsic interest of such processes, it will be necessary to examine auditory pattern processing at various levels of abstraction and over different temporal scales in order to identify common neural algorithms that may support these different levels of processing (the template matching algorithm invoked here for both early scene analysis in PT and object identification in STS might be one such candidate). The reduced auditory scenes used in the present work have been based on single sound sources presented consecutively. Natural auditory scenes, in which multiple sources are generally present simultaneously, may entail computational problems that are qualitatively as well as quantitatively different from those addressed here. Such scenes are difficult to create in functional imaging environments, however limited evidence (Zatorre et al., 2002b) suggests that the mechanisms proposed here may provide a useful framework for future work, including the use of virtual acoustic space techniques. This work supports a general model of human auditory cortex operation in which certain regions (such as PT and STS) behave as computational hubs that extract different spectrotemporal characteristics from the incoming signal by accessing stored information about those characteristics. Implicit in such a model is the capacity for the reciprocal exchange of information between the computational hub and higher-order cortices in which past experience about the acoustic world might be stored. A core feature common to such generative models (Friston & Price, 2001) is plasticity. The model proposed here 152

220 predicts a capacity for modification of the computational algorithm based on experience. This prediction could be tested experimentally both in human functional imaging studies of auditory learning, and directly using electrophysiological techniques in animal homologues. In the realm of auditory object processing, perhaps the single most pressing issue concerns the validity of visual analogies for auditory processing mechanisms. Early attempts to address this issue based on modelling constrained by experiments (Husain et al., 2004) suggest a strategy whereby empirical predictions concerning the organisation of auditory object processing (such as those underpinning Experiment 4 here) could be systematically tested. However, much further work is needed to establish the nature and extent of analogies between different sensory modalities, using different experimental techniques and in different species. Operationally equivalent computational stages will need to be determined by experiment, rather than assuming that auditory objects are analysed by neural mechanisms similar to those in the visual domain. 153

221 References 154

222 1. Adams RB, Janata P. (2002). Comparison of neural circuits underlying auditory and visual object categorization. NeuroImage 16, Ades HW, Felder R. (1942). The acoustic area of the monkey (macaca mulatta). J Neurophysiol 5, Ahissar M, Ahissar E, Bergman H, Vaadia E. (1992). Encoding of sound source location and movement: activity of single neurons and interactions between adjacent neurons in the monkey auditory cortex. J Neurophysiol 67, Alain C, Arnott SR, Hevenor S, Graham S, Grady CL. (2001). 'What' and 'where' in the human auditory system. Proc Natl Acad Sci USA 98, American Standards Association. (1960). American Standards Association, New York. 6. Ames A. (2000). CNS energy metabolism as related to function. Brain Res Rev 34, Anourova I, Nikouline VV, Ilmoniemi RJ, Hotta J, Aronen HJ, Carlson S. (2001). Evidence for dissociation of spatial and nonspatial auditory information processing. NeuroImage 14, Ashburner J, Friston KJ. (2000). Voxel-based morphometry the methods. NeuroImage 11, Attias H, Schreiner CE. (1998). Blind source separation and deconvolution: the dynamic component analysis algorithm. Neural Comput 10, Atzori M, Lei S, Evans IP, et al. (2001). Differential synaptic processing separates stationary from transient inputs to the auditory cortex. Nature Neurosci 4, Ayotte J, Peretz I, Rousseau I., Bard C, Bojanowski M. (2000). Patterns of music agnosia associated with middle cerebral artery infarcts. Brain 123, Azuma M, Suzuki H. (1984). Properties and distribution of auditory neurons in the dorsolateral prefrontal cortex of the alert monkey. Brain Res 298, Bandettini PA, Ungerleider LG. (2001). Editorial. From neuron to BOLD: new connections. Nature Neurosci 4, Bandettini PA, Wong EC. (1998). Echo-planar magnetic resonance imaging of human brain activation. In: Schmitt F et al. (eds), Echo-Planar Imaging: 155

223 Theory, Technique and Application, 2 nd Ed., Springer Verlag: New York, pp Bandettini PA, Jesmanowicz A, Van Kylen J, Birn RM, Hyde JS. (1998). Functional MRI of brain activation induced by scanner acoustic noise. Magn Res Med 39, Barnes CL, Pandya DN. (1992). Efferent cortical connections of multimodal cortex of the superior temporal sulcus in the rhesus monkey. J Comp Neurol 318, Baumgart F, Kaulisch T, Tempelmann C, et al. (1998). Electrodynamic headphones and woofers for application in magnetic resonance imaging scanners. Med Phys 25, Baumgart F, Gaschler-Markefski B, Woldorff MG, Heinze H.-J, Scheich H. (1999). A movement-sensitive area in auditory cortex. Nature 400, Baylis GC, Rolls ET, Leonard CM. (1987). Functional subdivisions of the temporal lobe neocortex. J Neurosci 7, Beauchamp MS, Lee KE, Argall BD, Martin A. (2004). Integration of auditory and visual information about objects in superior temporal sulcus. Neuron 41, Beck E. (1928). Der myeloarchitektonische Bau des in der Sylvischen Furche gelegenen Teiles des menschlichen Schläfenlappens. J Psychol Neurol 36, Belin P, Zatorre RJ. (2000). What, where and how in auditory cortex. Nature Neurosci 3, Belin P, Zatorre RJ. (2003). Adaptation to speaker s voice in right anterior temporal lobe. NeuroReport 14, Belin P, McAdams S, Smith B, et al. (1998). The functional anatomy of sound intensity discrimination. J Neurosci 18, Belin P, Zatorre RJ, Hoge R, Evans AC, Pike B. (1999). Event-related fmri of the auditory cortex. NeuroImage 10, Belin P, Zatorre RJ, Lafaille P, Ahad P, Pike B. (2000). Voice-selective areas in human auditory cortex. Nature 403, Belin P, McAdams S, Thivard L, et al. (2002a). The neuroanatomical substrate of sound duration discrimination. Neuropsychologia 40,

224 28. Belin P, Zatorre RJ, Ahad P. (2002b). Human temporal lobe responses to vocal sounds. Cogn Brain Res 13, Bell AJ, Sejnowski TJ. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Comput 7, Benson DA, Hienz RD, Goldstein MH. (1981). Single-unit activity in the auditory cortex of monkeys actively localizing sound sources: spatial tuning and behavioral dependency. Brain Res 219, Bilecen D, Scheffler K, Schmid N, Tschopp K, Seelig J. (1998). Tonotopic organization of the human auditory cortex as detected by BOLD-fMRI. Hear Res 126, Bilecen D, Seifritz E, Scheffler K, Henning J, Schulte AC. (2002). Amplitopicity of the human auditory cortex: an fmri study. NeuroImage 17, Binder JR, Rao SM, Hammeke TA, et al. (1994). Functional magnetic resonance imaging of human auditory cortex. Ann Neurol 35, Binder JR, Frost J.A, Hammeke TA, Rao SM, Cox, RW. (1996). Function of the left planum temporale in auditory and linguistic processing. Brain 119, Binder JR, Frost JA, Hammeke TA, Cox RW, Rao SM, Prieto T. (1997). Human brain language areas identified by functional magnetic resonance imaging. J Neurosci 17, Binder JR, Frost JA, Hammeke TA, et al. (2000). Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex 10, Binder JR, Liebenthal E, Possing ET, Medler DA, Ward BD. (2004). Neural correlates of sensory and decision processes in auditory object identification. Nature Neurosci 7, Bischoff-Grethe A, Proper SM, Mao H, Daniels KA, Berns GS. (2000). Conscious and unconscious processing of nonverbal predictability in Wernicke s area. J Neurosci 20, Bookstein FL. (1995). How to produce a landmark point: the statistical geometry of incompletely registered images. SPIE Proc 2573, Brechmann A, Baumgart F, Scheich H. (2002). Sound-level-dependent representation of frequency modulations in human auditory cortex: a low-noise fmri study. J Neurophysiol 87,

225 41. Bregman AS. (1990). Auditory Scene Analysis, MIT Press, Cambridge. 42. Bremmer F, Schlack A, Shah NJ, et al. (2001). Polymodal motion processing in posterior parietal and premotor cortex: a human fmri study strongly implies equivalencies between humans and monkeys. Neuron 29, Brodmann K. (1909). Vergleichende Lokalisationslehre der Grosshirnrinde in Ihren Prinzipien dargestellt auf Grund des Zellenbaues, transl. Garey LJ, Barth: Leipzig. 44. Brosch M, Selezneva E, Bucks C, Scheich H. (2004). Macaque monkeys discriminate pitch relationships. Cognition 91, Brown CH, Schessler T, Moody D, Stebbins W. (1980). Vertical and horizontal sound localization in primates. J Acoust Soc Amer 72, Brugge JF, Merzenich MM. (1973). Responses of neurons in auditory cortex of the macaque monkey to monaural and binaural stimulation. J Neurophysiol 36, Buckner RL, Bandettini PA, O Craven KM, et al. (1996). Detection of cortical activation during averaged single trials of a cognitive task using functional magnetic resonance imaging. Proc Natl Acad Sci USA 93, Burton H, Jones EG. (1976). The posterior thalamic region and its cortical projection in New World and Old World monkeys. J Comp Neurol 168, Bushara KO, Weeks RA, Ishii K, et al. (1999). Modality-specific frontal and parietal areas for auditory and visual spatial localization in humans. Nature Neurosci 2, Buxton RB, Frank LR. (1997). A model for the coupling between cerebral blood flow and oxygen metabolism during neural stimulation. J Cereb Blood Flow Metab 17, Buxton RB, Wong EC, Frank LR. (1998). Dynamics of blood flow and oxygenation changes during brain activation: the balloon model. Magn Res Med 39, Calvert GA, Bullmore ET, Brammer MJ, et al. (1997). Activation of auditory cortex during silent lipreading. Science 276, Campbell AW. (1905). Histological Studies on the Localization of Cerebral Function, University Press: Cambridge. 158

226 54. Cansino S, Williamson SJ, Karron D. (1994). Tonotopic organization of human auditory association cortex. Brain Res 663, Cavada C, Goldman-Rakic PS. (1989). Posterior parietal cortex in rhesus monkey: I. Parcellation of areas based on distinctive limbic and sensory corticocortical connections. J Comp Neurol 287, Celesia GG. (1976). Organization of auditory cortical areas in man. Brain 99, Chambers J, Akeroyd MA, Summerfield Q, Palmer AR. (2001). Active control of the volume acquisition noise in functional magnetic resonance imaging: method and psychoacoustical evaluation. J Acoust Soc Amer 110, Chiry O, Tardif E, Magistretti PJ, Clarke S. (2003). Patterns of calcium-binding proteins support parallel and hierarchical organization of human auditory areas. Eur J Neurosci 17, Clarke S, Bellmann A, Meuli RA, Assal G, Steck AJ. (2000). Auditory agnosia and auditory spatial deficits following left hemisphere lesions: evidence for distinct processing pathways. Neuropsychologia 38, Cohen MS. (1996). Rapid MRI and functional applications. In: Toga AW, Mazziotta JC (eds), Brain Mapping: The Methods, Academic Press: San Diego, pp Cohen MS. (1999). Echo-planar imaging and functional MRI. In: Moonen CTW et al. (eds), Functional MRI, Springer Verlag: New York, pp Cohen YE, Wessinger CM. (1999). Who goes there? Neuron 24, Cohen YE, Russ BE, Gifford GW, Kiringoda R, MacLean KA. (2004). Selectivity for spatial and nonspatial attributes of auditory stimuli in the ventrolateral prefrontal cortex. J Neurosci 24, Cowey A, Dewson JH. (1972). Effects of unilateral ablation of superior temporal cortex on auditory sequence discrimination in Macaca mulatta. Neuropsychologia 10, Creutzfeldt O, Ojemann G, Lettich E. (1989). Neuronal activity in the human lateral temporal lobe. I. Responses to speech. Exp Brain Res 77, Crinion J, Lambon Ralph MA, Warburton EA, Howard D, Wise RJS. (2003). Temporal lobe regions engaged during normal speech comprehension. Brain 126,

227 67. Cusack R, Carlyon RP, Robertson IH. (2001). Neglect between but not within auditory objects. J Cogn Neurosci 12, Davis MH, Johnsrude IS. (2003). Hierarchical processing in spoken language comprehension. J Neurosci 23, De Charms RC, Blake DT, Merzenich MM. (1998). Optimizing sound features for cortical neurons. Science 280, Deichmann R, Good CD, Josephs O, Ashburner J, Turner R. (2000). Optimization of 3-D MP-RAGE sequences for structural brain imaging. NeuroImage 12, Demany L, Armand P. (1984). The perceptual reality of tone chroma in early infancy. J Acoust Soc Amer 76, De Renzi E, Gentilini M, Pattacini F. (1984). Auditory extinction following hemisphere damage. Neuropsychologia 22, Deutsch D. (1999). The processing of pitch combinations. In: Deutsch D (ed), The Psychology of Music, 2 nd Ed, Academic Press: San Diego, pp Diesch E, Luce T. (2000). Topographic and temporal indices of vowel spectral envelope extraction in the human auditory cortex. J Cogn Neurosci 12, Di Salle F, Formisano E, Seifritz E, et al. (2001). Functional fields in human auditory cortex revealed by time-resolved fmri without interference of EPI noise. NeuroImage 13, Disbrow E, Litinas E, Recanzone G, Padberg J, Krubitzer L. (2003). Cortical connections of the second somatosensory area and the parietal ventral area in macaque monkeys. J Comp Neurol 462, Dissard P, Darwin CJ. (2000). Extracting spectral envelopes: formant frequency matching between sounds on different and modulated fundamental frequencies. J Acoust Soc Amer 107, Downar J, Crawley AP, Mikulis DJ, Davis KD. (2000). The effect of task relevance on the cortical response to changes in visual and auditory stimuli: an event-related fmri study. NeuroImage 14, Ducommun CY, Murray MM, Thut G, et al. (2002). Segregated processing of auditory motion and auditory location: an ERP mapping study. NeuroImage 16,

228 80. Eden GF, Joseph JE, Brown CP, Zeffiro TA. (1999). Utilizing hemodynamic delay and dispersion to detect fmri signal change without auditory interference: the behavior interleaved gradients technique. Magn Reson Med 41, Edmister WB, Talavage TM, Ledden PJ, Weisskoff RM. (1999). Improved auditory cortex imaging using clustered volume acquisitions. Hum Brain Mapp 7, Ehrlé N, Samson S, Baulac M. (2001). Processing of rapid auditory information in epileptic patients with left temporal lobe damage. Neuropsychologia 39, Engelien A, Silbersweig D, Stern E, et al. (1995). The functional anatomy of recovery from auditory agnosia. A PET study of sound categorization in a neurological patient and normal controls. Brain 118, Eustache F, Lechevalier B, Viader F, Lambert J. (1990). Identification and discrimination disorders in auditory perception: a report on two cases. Neuropsychologia 28, Evans AC, Collins DL, Mills SR, Brown ED, Kelly RL, Peters TM. (1993). 3D statistical neuroanatomical models from 305 MRI volumes. Proc. IEEE Nucl. Sci. Symp. Med. Imag. Conf. 3, Felleman DJ, Van Essen DC. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1, Fellowes JM, Remez RE, Rubin PE. (1997). Perceiving the sex and identity of a talker without natural vocal timbre. Percept Psychophys 59, Ferrier D. (1876). The Functions of the Brain, Smith Elder & Co: London. 89. Flechsig P. (1920). Anatomie des menschlichen Gehirns und Rückenmarks. Verlag von Georg Thieme: Leipzig. 90. Formisano E, Kim DS, Di Salle F, Van de Moortele PF, Ugurbil K, Goebel R. (2003). Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron 40, Fox PT. (1995). Editorial. Spatial normalization origins: objectives, applications and alternatives. Hum Brain Mapp 3, Fox PT, Raichle ME. (1986). Focal physiological uncoupling of cerebral blood flow and oxidative metabolism during somatosensory stimulation in human subjects. Proc Natl Acad Sci USA 83,

229 93. Friston KJ. (1997). Imaging cognitive anatomy. Trends Cogn Sci 1, Friston KJ. (2004). Experimental design and statistical parametric mapping. In: Frackowiak RSJ et al. (eds), Human Brain Function, 2 nd Ed., Academic Press: London, pp Friston KJ, Price CJ. (2001). Dynamic representations and generative models of brain function. Brain Res Bull 54, Friston KJ, Jezzard P, Turner R. (1994). Analysis of functional MRI time-series. Hum Brain Mapp 1, Friston KJ, Ashburner J, Frith CD, Poline JB, Heather JD, Frackowiak RSJ. (1995a). Spatial registration and normalisation of images. Hum Brain Mapp 2, Friston KJ, Holmes AP, Worsley KJ, Poline JP, Frith CD, Frackowiak RSJ. (1995b). Statistical parametric maps in functional imaging: a general linear approach. Hum Brain Mapp 2, Friston KJ, Price CJ, Fletcher P, Moore C, Frackowiak RSJ, Dolan RJ. (1996). The trouble with cognitive subtraction. NeuroImage 4, Friston KJ, Holmes AP, Worsley KJ. (1999a). How many subjects constitute a study? NeuroImage 10, Friston KJ, Holmes AP, Price CJ, Buchel C, Worsley KJ. (1999b). Multisubject fmri studies and conjunction analyses. NeuroImage 10, Friston KJ, Zarahn E, Josephs O, Henson RNA, Dale AM. (1999c). Stochastic designs in event-related fmri. NeuroImage 10, Friston KJ, Josephs O, Zarahn E, Holmes AP, Rouquette S, Poline JB. (2000a). To smooth or not to smooth? Bias and efficiency in fmri time series. NeuroImage 12, Friston KJ, Mechelli A, Turner R, Price CJ. (2000b). Nonlinear responses in fmri: the balloon model, Volterra kernels, and other hemodynamics. NeuroImage 12, Gaab N, Gaser C, Zaehle T, Jancke L, Schlaug G. (2003). Functional anatomy of pitch memory an fmri study with sparse temporal sampling. NeuroImage 19,

230 106. Gaffan D, Harrison S. (1991). Auditory-visual associations, hemispheric specialization and temporal-frontal interaction in the rhesus monkey. Brain 114, Galaburda AM, Pandya DN. (1983). The intrinsic architectonic and connectional organization of the superior temporal region of the rhesus monkey. J Comp Neurol 221, Galaburda AM, Sanides F. (1980). Cytoarchitectonic organization of the human auditory cortex. J Comp Neurol 190, Galuske RAW, Schuhmann A, Schlote W, Bratzke H, Singer W. (1999). Interareal connections in the human auditory cortex. NeuroImage 9, S Galuske RAW, Schlote W, Bratzke H, Singer W. (2000). Interhemispheric assymetries of the modular structure in human temporal cortex. Science 289, Genovese CR, Lazar NA, Nichols T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. NeuroImage 15, Ghazanfar AA, Hauser MD. (2001). The auditory behaviour of primates: a neuroethological perspective. Curr Opin Neurobiol 11, Ghazanfar AA, Neuhoff JG, Logothetis NK. (2002). Auditory looming perception in rhesus monkeys. Proc Natl Acad Sci USA 99, Giraud AL, Price CJ. (2001). The constraints functional neuroimaging places on classical models of auditory word processing. J Cogn Neurosci 13, Giraud AL, Lorenzi C, Ashburner J, et al. (2000). Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84, Goncalves MS, Hall DA, Johnsrude IS, Haggard MP. (2001). Can meaningful effective connectivities be obtained between auditory cortical regions? NeuroImage 14, Grady CL, Van Meter JW, Maisog JM, Pietrini P, Krasuski J, Rauschecker JP. (1997). Attention-related modulation of activity in primary and secondary auditory cortex. NeuroReport 8, Grantham DW. (1986). Detection and discrimination of simulated motion auditory targets in the horizontal plane. J Acoust Soc Amer 79, Graziano MSA, Reiss L.A. Gross CG. (1999). A neuronal representation of the location of nearby sounds. Nature 397,

231 120. Grey JM. (1977). Multidimensional perceptual scaling of musical timbres. J Acoust Soc Amer 61, Griffiths TD, Green GGR. (1999). Cortical activation during perception of a rotating wide-field acoustic stimulus. NeuroImage 10, Griffiths T D, Bench CJ, Frackowiak RSJ. (1994). Human cortical areas selectively activated by apparent sound movement. Curr Biol 4, Griffiths TD, Rees A, Witton C, Shakir RA, Henning GB, Green GGR. (1996). Evidence for a sound movement area in the human cerebral cortex. Nature 383, Griffiths TD, Rees A, Witton C, Cross PM, Shakir RA, Green GGR. (1997) Spatial and temporal auditory processing deficits following right hemisphere infarction: a psychophysical study. Brain 120, Griffiths TD, Büchel C, Frackowiak RSJ, Patterson RD. (1998a). Analysis of temporal structure in sound by the human brain. Nature Neurosci 1, Griffiths TD, Rees G, Rees A, et al. (1998b). Right parietal cortex is involved in the perception of sound movement in humans. Nature Neurosci 1, Griffiths TD, Johnsrude I., Dean JL, Green GGR. (1999a). A common neural substrate for the analysis of pitch and duration pattern in segmented sound? NeuroReport 18, Griffiths TD, Rees A, Green GGR. (1999b). Disorders of human complex sound processing. Neurocase 5, Griffiths TD, Green GGR, Rees A, Rees G. (2000). Human brain areas involved in the analysis of auditory movement. Hum Brain Mapp 9, Griffiths TD, Uppenkamp S, Johnsrude I, Josephs O, Patterson RD. (2001). Encoding of the temporal regularity of sound in the human brainstem. Nature Neurosci 4, Grootoonk S, Hutton C, Ashburner J, et al. (2000). Characterization and correction of interpolation effects in the realignment of fmri time series. NeuroImage 11, Guimaraes AR, Melcher JR, Talavage TM, et al. (1998). Imaging subcortical auditory activity in humans. Hum Brain Mapp 6,

232 133. Hackett TA, Preuss TM, Kaas JH. (2001). Architectonic identification of the core region in auditory cortex of macaques, chimpanzees, and humans. J Comp Neurol 441, Hackett TA, Stepniewska I, Kaas JH. (1998a). Subdivisions of auditory cortex and ipsilateral cortical connections of the parabelt auditory cortex in macaque monkeys. J Comp Neurol 394, Hackett TA, Stepniewska I, Kaas JH. (1998b). Thalamocortical connections of the parabelt auditory cortex in macaque monkeys. J Comp Neurol 400, Hackett TA, Stepniewska I, Kaas JH. (1999a). Callosal connections of the parabelt auditory cortex in macaque monkeys. Eur J Neurosci 11, Hackett TA, Stepniewska I, Kaas JH. (1999b). Prefrontal connections of the auditory parabelt cortex in macaque monkeys. Brain Res 817, Haeske-Dewick H, Canavan AGM, Homberg V. (1996). Sound localization in egocentric space following hemispheric lesions. Neuropsychologia 34, Hahn E. (1950). Spin echoes. Phys Rev 80, Hajnal JV, Saeed N, Soar EJ, Oatridge A, Young IR, Bydder GM. (1995). A registration and interpolation procedure for subvoxel matching of serially acquired MR images. J Comput Assist Tomogr 19, Hall DA. (2003). Auditory pathways: are what and where appropriate? Curr Biol 13, Hall DA, Haggard MP, Akeroyd MA. (1999). "Sparse" temporal sampling in auditory fmri. Hum Brain Mapp 7, Hall DA, Haggard MP, Akeroyd MA, et al. (2000). Modulation and task effects in auditory processing measured using fmri. Hum Brain Mapp 10, Hall DA, Johnsrude IS, Haggard MP, Palmer AR, Akeroyd MA, Summefield AQ. (2002). Spectral and temporal processing in human auditory cortex. Cereb Cortex 12, Halpern AR, Zatorre RJ. (1999). When that tune runs through your head: a PET investigation of auditory imagery for familiar melodies. Cereb Cortex 9, Harrington IA, Heffner HE. (2002). A behavioral investigation of separate processing streams within macaque auditory cortex. Proc ARO

233 147. Harrington IA, Heffner HE. (2004). Intensity discrimination following cortical lesions in macaques: detection of increments, decrements and amplitude modulation. Proc ARO Harrington IA, Heffner RS, Heffner HE. (2001). An investigation of sensory deficits underlying the aphasia-like behavior of macaques with auditory cortex lesions. NeuroReport 12, Hart HC, Palmer AR, Hall DA. (2002). Heschl s gyrus is more sensitive to tone level than non-primary auditory cortex. Hear Res 171, Hart HC, Hall DA, Palmer AR. (2003a). The sound-level-dependent growth in the extent of fmri activation in Heschl s gyrus is different for low- and highfrequency tones. Hear Res 179, Hart HC, Palmer AR, Hall DA. (2003b). Amplitude and frequency-modulated stimuli activate common regions of human auditory cortex. Cereb Cortex 13, Hart HC, Palmer AR, Hall DA. (2004). Different areas of human non-primary auditory cortex are activated by sounds with spatial and non-spatial properties. Hum Brain Mapp 21, Hashimoto R, Homae FF, Nakajima K, Miyashita Y, Sakai KL (2000). Functional differentiation in the human auditory and language areas revealed by a dichotic listening task. NeuroImage 12, Hausler R, Colburn S, Marr E. (1983). Sound localisation in subjects with impaired hearing. Acta Otolaryngol Supp 400, Heffner HE, Heffner RS. (1984). Temporal lobe lesions and perception of species-specific vocalizations by macaques. Science 226, Heffner HE, Heffner RS. (1986a). Hearing loss in Japanese macaques following bilateral auditory cortex lesions. J Neurophysiol 55, Heffner HE, Heffner RS. (1986b). Effect of unilateral and bilateral auditory cortex lesions on the discrimination of vocalizations by Japanese macaques. J Neurophysiol 56, Heffner HE, Heffner RS. (1989a). Unilateral auditory cortex ablation in macaques results in a contralateral hearing loss. J Neurophysiol 62, Heffner HE, Heffner RS. (1989b). Cortical deafness cannot account for the inability of Japanese macaques to discriminate species-specific vocalizations. Brain Lang 36,

234 160. Heffner HE, Heffner RS. (1989c). Effect of restricted cortical lesions on absolute thresholds and aphasia-like deficits in Japanese macaques. Behav Neurosci 103, Heffner HE, Heffner RS. (1990a). Effect of bilateral auditory cortex lesions on absolute thresholds in Japanese macaques. J Neurophysiol 64, Heffner HE., Heffner RS. (1990b). Effect of bilateral auditory cortex lesions on sound localization in Japanese macaques. J Neurophysiol 64, Helmholtz HLF. (1875). On the Sensations of Tone as a Physiological Basis for the Theory of Music, transl. Ellis AJ, Longman s: London Henning GB. (1973). Detectability of interaural delay in high-frequency complex waveforms. J Acoust Soc Amer 55, Hikosaka K, Iwai E, Saito H, Tanaka K. (1988). Polysensory properties of neurons in the anterior bank of the caudal superior temporal sulcus of the macaque monkey. J Neurophysiol 60, Hofman PM, Riswick JGA, Van Opstal AJ. (1998). Relearning sound localisation with new ears. Nature Neurosci 1, Howard MA, Volkov IO, Abbas PJ, Damasio H, Ollendieck MC, Granner MA. (1996a). A chronic microelectrode investigation of the tonotopic organization of human auditory cortex. Brain Res 724, Howard MA, Volkov IO, Mirsky R. (2000). Auditory cortex on the human posterior superior temporal gyrus. J Comp Neurol 416, Howard RJ, Brammer M, Wright I., Woodruff PW, Bullmore ET, Zeki S. (1996b). A direct demonstration of functional specialization within motionrelated visual and auditory cortex of the human brain. Curr Biol 6, Humphries C, Willard K, Buchsbaum B, Hickok G. (2001). Role of anterior temporal cortex in auditory sentence comprehension: an fmri study. NeuroReport 12, Hunter MD, Griffiths TD, Farrow TFD, et al. (2003). A neural basis for the perception of voices in external auditory space. Brain 126, Husain FT, Tagamets MA, Fromm SJ, Braun AR, Horwitz B. (2004). Relating neuronal dynamics for auditory object processing to neuroimaging activity: a computational modelling and fmri study. NeuroImage 21,

235 173. Hutton C, Bork A, Josephs O, Deichmann R, Ashburner J, Turner R. (2002). Image distortion correction in fmri: a quantitative evaluation. NeuroImage 16, Irino T, Patterson RD. (2002). Segregating information about the size and shape of the vocal tract using a time-domain auditory model: the stabilised wavelet- Mellin transform. Speech Comm 36, Jackson LL, Heffner RS, Heffner HE. (1999). Free-field audiogram of the Japanese macaque (Macaca fuscata). J Acoust Soc Amer 106, Jacquemot C, Pallier C, LeBihan D, Dehaene S, Dupoux E. (2003). Phonological grammar shapes the auditory cortex: a functional magnetic resonance imaging study. J Neurosci 23, Janata P, Birk JL, Van Horn J, Leman M, Tillmann B, Bharucha JJ. (2002). The cortical topography of tonal structures underlying western music. Science 298, Jäncke L, Shah NJ, Posse S, Grosse-Ryuken M, Muller-Gartner HW. (1998). Intensity coding of auditory stimuli: an fmri study. Neuropsychologia 36, Jäncke L, Buchanan T, Lutz L, Specht K, Mirzazade S, Shah NJ. (1999). The time course of the BOLD response in human auditory cortex to acoustic stimuli of different duration. Cogn Brain Res 8, Jäncke L, Wüstenberg T, Scheich H, Heinze H.-J. (2002). Phonetic perception and the temporal cortex. NeuroImage 15, Jiang A, Kennedy DN, Baker JR, et al. (1995). Motion detection and correction in functional MR imaging. Hum Brain Mapp 3, Johnsrude I, Penhune VB, Zatorre RJ. (2000). Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain 123, Jones EG, Powell TPS. (1970). An anatomical study of converging sensory pathways within the cerebral cortex of the monkey. Brain 93, Jones EG, Dell anna ME, Molinari M, Rausell E, Hashikawa T. (1995). Subdivisions of macaque monkey auditory cortex revealed by calcium-binding protein immunoreactivity. J Comp Neurol 362, Joshi SC, Miller MI, Christensen GE, Banerjee A, Coogan TA, Grenander U. (1995). Hierarchical brain mapping via a generalized Direchlet solution for mapping brain manifolds. Vision Geometry IV, Proc. SPIE Conference on 168

236 Optical Science, Engineering and Instrumentation. San Diego, CA 2573, Kaas JH, Hackett TA. (1999). Editorial. What and where processing in auditory cortex. Nature Neurosci 2, Kaas JH, Hackett TA. (2000). Subdivisions of auditory cortex and processing streams in primates. Proc Natl Acad Sci USA 97, Kaas JH, Hackett TA, Tramo MJ. (1999). Auditory processing in primate cerebral cortex. Curr Op Neurobiol 9, Kitzes LM, Hollrigel GS. (1996). Response properties of units in the posterior auditory field deprived of input from the ipsilateral primary auditory cortex. Hear Res 100, Kohlmetz C, Muller SV, Nager W, Munte TF, Altenmuller E. (2003). Selective loss of timbre perception for keyboard and percussion instruments following a right temporal lesion. Neurocase 9, Kosaki H, Hashikawa T, He J, Jones EG. (1997). Tonotopic organization of auditory cortical fields delineated by parvalbumin immunoreactivity in macaque monkeys. J Comp Neurol 386, Krumbholz K, Patterson RD, Seither-Preisler A, Lammertmann C, Lütkenhöner B. (2003). Neuromagnetic evidence for a pitch processing center in Heschl s gyrus. Cereb Cortex 13, Krumhansl CL. (1990). Cognitive Foundations of Musical Pitch, OUP: New York, pp Krumhansl CL, Iverson P. (1992). Perceptual interactions between musical pitch and timbre. J Exp Psychol 18, Kubovy M, Van Valkenburg D. (2000). Auditory and visual objects. Cognition 80, Lai S, Hopkins AL, Haacke EM, et al. (1993). Identification of vascular structures as a major source of signal contrast in high resolution 2D and 3D functional activation imaging of the motor cortex at 1.5 T: preliminary results. Magn Res Med 30, Langner G, Sams M, Heil P, Schulze H. (1997). Frequency and periodicity are represented in orthogonal maps in the human auditory cortex: evidence from magnetoencephalography. J Comp Physiol 181,

237 198. Lauter J, Herscovitch P, Formby C, Raichle M. (1985). Tonotopic organization in human auditory cortex revealed by PET. Hear Res 20, Leinonen L, Hyvärinen J, Sovijärvi ARA. (1980). Functional properties of neurons in the temporo-parietal association cortex of awake monkey. Exp Brain Res 39, Leonard CM, Puranik C, Kuldau JM, Lombardino LJ. (1998). Normal variation in the frequency and location of human auditory cortex landmarks. Cereb Cortex 8, Lewis JW, Van Essen DC. (2000). Corticocortical connections of visual, sensorimotor, and multimodal processing areas in the parietal lobe of the macaque monkey. J Comp Neurol 428, Lewis JW, Beauchamp MS, DeYoe EA. (2000). A comparison of visual and auditory motion processing in human cerebral cortex. Cereb Cortex 10, Lewis JW, Wightman FL, Brefczynski JA, Phinney RE, Binder JR, DeYoe EA. (2004). Human brain regions involved in recognizing environmental sounds. Cereb Cortex 14, Liebenthal E, Binder JR, Piorkowski RL, Remez RE. (2003a). Short-term reorganization of auditory cortex induced by phonetic expectation. J Cogn Neurosci 15, Liebenthal E, Ellingson ML, Spanaki MV, Prieto TE, Ropella KM, Binder JR. (2003b). Simultaneous ERP and fmri of the auditory cortex in a passive oddball paradigm. NeuroImage 19, Liégeois-Chauvel C, Musolino A, Chauvel P. (1991). Localization of the primary auditory area in man. Brain 114, Liégeois-Chauvel C, Musolino A, Badier JM, Marquis P, Chauvel P. (1994). Evoked potentials recorded from the auditory cortex in man: evaluation and topography of the middle latency components. EEG Clin Neurophysiol 92, Liégeois-Chauvel C, Peretz I, Babai M, et al. (1998). Contribution of different cortical areas in the temporal lobes to music processing. Brain 121, Liégeois-Chauvel C, Giraud K, Badier JM, Marquis P, Chauvel P. (2001). Intracerebral evoked potentials in pitch perception reveal a functional asymmetry of the human auditory cortex. Ann NY Acad Sci 930,

238 210. Linden JF, Schreiner CE. (2003). Columnar transformations in auditory cortex? A comparison to visual and somatosensory cortices. Cereb Cortex 13, Lipschutz B, Kolinsky R, Damhaut P, Wikler D, Goldman S. (2002). Attentiondependent changes of activation and connectivity in dichotic listening. NeuroImage 17, Lockwood AH, Salvi RJ, Coad ML, et al. (1999). The functional anatomy of the normal human auditory system: responses to 0.5 and 4.0 khz tones at varied intensities. Cereb Cortex 9, Logothetis NK. (2002). The neural basis of the blood-oxygen-level-dependent functional magnetic resonance imaging signal. Phil Trans R Soc Lond B 357, Logothetis NK. (2003). The underpinnings of the BOLD functional magnetic resonance imaging signal. J Neurosci 23, Logothetis NK, Pauls J, Augath M, Trinath T, Oeltermann A. (2001). Neurophysiological investigation of the basis of the fmri signal. Nature 412, Lorenzi C, Wable J, Moroni C, Derobert C, Frachet B, Belin C. (2000). Auditory temporal envelope processing in a patient with left-hemisphere damage. Neurocase 6, Lütkenhöner B, Krumholz K, Lammertmann C, Seither-Preisler A, Steinstrater O, Patterson RD. (2003). Localization of primary auditory cortex in humans by magnetoencephalography. NeuroImage 18, Maeder PP, Meuli RA, Adriani M, et al. (2001). Distinct pathways involved in sound recognition and localization: a human fmri study. NeuroImage 14, Maess B, Koelsch S, Gunter TC, Friederici AD. (2001). Musical syntax is processed in Broca s area: an MEG study. Nature Neurosci 4, Malone BJ, Scott BH, Semple MN. (2002). Context-dependent adaptive coding of interaural phase disparity in the auditory cortex of awake macaques. J Neurosci 22, Malonek D, Grinvald A. (1996). Interactions between electrical activity and cortical microcirculation revealed by imaging spectroscopy: implications for functional brain mapping. Science 272,

239 222. Mansfield P. (1977). Multi-planar image formation using NMR spin echoes. J Phys 10, L Marcus DS, Van Essen DC. (2002). Scene segmentation and attention in primate cortical areas V1 and V2. J Neurophysiol 88, Marshall, JC. (2000). Planum of the apes: a case study. Brain Lang 71, Mazziotta JC, Toga AW, Evans A, Fox P, Lancaster J. (1995). A probabilistic atlas of the human brain: theory and rationale for its development. NeuroImage 2, Mazzoni P, Bracewell RM, Barash S, Anderson RA. (1996). Spatially tuned auditory responses in area LIP of macaques performing delayed memory saccades to acoustic targets. J Neurophysiol 75, McAdams S, Cunible JC. (1992). Perception of timbral analogies. Philos Trans Roy Soc Lond B Biol Sci 336, McGuire PK, Silbersweig DA, Frith CD. (1996). Functional neuroanatomy of verbal self-monitoring. Brain 119, Mendez MF, Geehan GR. (1988). Cortical auditory disorders: clinical and psychoacoustic features. J Neurol Neurosurg Psychiatry 51, Menon V, Levitin DJ, Smith BK, et al. (2002). Neural correlates of timbre change in harmonic sounds. NeuroImage 17, Merzenich MM, Brugge JF. (1973). Representation of the cochlear partition on the superior temporal plane of the macaque monkey. Brain Res 50, Mesulam MM. (1998). From sensation to cognition. Brain 121, Mesulam MM, Mufson EJ. (1985). The insula of Reil in man and monkey: architectonics, connectivity, and function. In: Peters A, Jones EG (eds), Cerebral Cortex, Vol 4, Association and Auditory Cortices, Plenum Press: New York, pp Middlebrooks JC. (2002). Auditory space processing: here, there or everywhere? Nature Neurosci 5, Middlebrooks JC, Green DM. (1991). Sound localization by human listeners. Ann Rev Psychol 42, Middlebrooks JC, Clock AE, Xu L, Green DM. (1994). A panoramic code for sound location by cortical neurons. Science 264,

240 237. Miller LM, Escabi MA, Read HL, Schreiner CE. (2001). Functional convergence of response properties in the auditory thalamocortical system. Neuron 32, Mitchell RLC, Elliott R, Barry M, Cruttenden A, Woodruff PWR. (2003). The neural response to emotional prosody as revealed by functional magnetic resonance imaging. Neuropsychologia 41, Moelker A, Pattynama PMT. (2003). Acoustic noise concerns in functional magnetic resonance imaging. Hum Brain Mapp 20, Morel A, Garraghty PE, Kaas JH. (1993). Tonotopic organization, architectonic fields and connections of auditory cortex in macaque monkeys. J Comp Neurol 335, Morosan P, Rademacher J, Schleicher A, Amunts K, Schormann T, Zilles K. (2001). Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into a spatial reference system. NeuroImage 13, Mummery CJ, Ashburner J, Scott SK, Wise RJS. (1999). Functional neuroimaging of speech perception in six normal and two aphasic subjects. J Acoust Soc Amer 106, Mustovic H, Scheffler K, Di Salle F, et al. (2003). Temporal integration of sequential auditory events: silent period in sound pattern activates human planum temporale. NeuroImage 20, Näätänen R, Lehtokoski A, Lennes M, et al. (1997). Language-specific phoneme representations revealed by electric and magnetic brain responses. Nature 385, Näätänen R, Tervaniemi M, Sussman E, Paavilainen P, Winkler I. (2001). 'Primitive intelligence' in the auditory cortex. Trends Neurosci 24, Nature Neuroscience Editorial. (2001). Analyzing functional imaging studies. Nature Neurosci 4, Nelken I. (2002). Feature detection by the auditory cortex. In: Oertel D, Fay RR, Popper AN (eds), Integrative Functions in the Mammalian Auditory Pathway, Springer: New York, pp Nelken I., Rotman Y, Bar Yosef O. (1999). Responses of auditory-cortex neurons to structural features of natural sounds. Nature 397,

241 249. Nobre AC, Sebetyen GN, Gitelman DR, Mesulam MM, Frackowiak RSJ, Frith CD. (1997). Functional localisation of the system for visuospatial attention using positron emission tomography. Brain 120, Ogawa S, Lee TM, Kay AR, Tank DW. (1990). Brain magnetic resonance imaging with contrast dependent on blood oxygenation. Proc Natl Acad Sci USA 87, Ogawa S, Tank DW, Menon R, et al. (1992). Intrinsic signal changes accompanying sensory stimulation: functional brain mapping with magnetic resonance imaging. Proc Natl Acad Sci USA 89, Ojemann JG, Akbudak E, Snyder AZ, McKinstry RC, Raichle ME, Conturo TE. (1997). Anatomic localization and quantitative analysis of gradient refocused echo-planar fmri susceptibility artifacts. NeuroImage 6, Opitz B, Rinne T, Mecklinger A, Yves von Cramon D, Schroger E. (2002). Differential contribution of frontal and temporal cortices to auditory change detection: fmri and ERP results. NeuroImage 15, Palmer AR, Bullock DC, Chambers JD. (1998). A high-output, high-quality sound system for use in auditory fmri. NeuroImage 7, S Palmeri TJ, Gauthier I. (2004). Visual object understanding. Nature Rev Neurosci 5, Pandya, DN. (1995). Anatomy of the auditory cortex. Rev Neurol 151, Pandya DN, Hallett M, Mukherjee SK. (1969). Intra- and interhemispheric connections of the neocortical auditory system in the rhesus monkey. Brain Res 14, Pandya DN, Yeterian EH. (1985). Architecture and connections of cortical association areas. In: Peters A, Jones EG (eds), Cerebral Cortex, Vol 4, Association and Auditory Cortices, Plenum Press: New York, pp Pantev C, Bertrand O, Eulitz C, et al. (1995). Specific tonotopic organizations of different areas of the human auditory cortex revealed by simultaneous magnetic and electric recordings. EEG Clin Neurophysiol 94, Pantev C, Oostenveld R, Engelien A, Ross B, Roberts LE, Hoke M. (1998). Increased auditory cortical representation in musicians. Nature 392, Papez JW. (1929). Comparative Neurology. Thomas Y. Crowell: New York. 174

242 262. Patel AD, Peretz I, Tramo M, Labreque R. (1998). Processing prosodic and musical patterns: a neuropsychological investigation. Brain Lang 61, Patterson RD. (1990). The tone height of multiharmonic sounds. Mus Percept 8, Patterson RD, Milroy R, Allerhand M. (1993). What is the octave of a harmonically rich note? Contemp Mus Rev 9, Patterson RD, Uppenkamp S, Johnsrude IS, Griffiths TD. (2002). The processing of temporal pitch and melody information in auditory cortex. Neuron 36, Pauling L, Coryell C. (1936). The magnetic properties and structure of hemoglobin. Proc Natl Acad Sci USA 22, Paus T, Zatorre RJ, Hofle N, et al. (1997). Time-related changes in neural systems underlying attention and arousal during the performance of an auditory vigilance task. J Cogn Neurosci 9, Pavani F, Ladavas E, Driver J. (2002a). Selective deficit of auditory localisation in patients with visuospatial neglect. Neuropsychologia 40, Pavani F, Macaluso E, Warren JD, Driver J, Griffiths TD. (2002b). A common cortical substrate activated by horizontal and vertical sound movement in the human brain. Curr Biol 12, Pell MD. (1998). Recognition of prosody following unilateral brain lesion: influence of functional and structural attributes of prosodic contours. Neuropsychologia 36, Penhune VB, Zatorre RJ, MacDonald JD, Evans AC. (1996). Interhemispheric anatomical differences in human primary auditory cortex: probabilistic mapping and volume measurement from magnetic resonance scans. Cereb Cortex 6, Penhune VB, Zatorre RJ, Evans AC. (1998). Cerebellar contributions to motor timing: a PET study of auditory and visual rhythm reproduction. J Cogn Neurosci 10, Penhune VB, Zatorre RJ, Feindel WH. (1999). The role of auditory cortex in retention of rhythmic patterns as studied in patients with temporal lobe removals including Heschl s gyrus. Neuropsychologia 37, Peretz I. (1990). Processing of local and global musical information by unilateral brain-damaged patients. Brain 113,

243 275. Peretz I, Kolinsky R, Tramo M, et al. (1994). Functional dissociations following bilateral lesions of auditory cortex. Brain 117, Peretz I, Blood AJ, Penhune V, Zatorre R. (2001). Cortical deafness to dissonance. Brain 124, Perrott DR, Marlborough K. (1989). Minimum audible movement angle: Marking the end points of the path traveled by a moving sound source. J Acoust Soc Amer 85, Perry DW, Zatorre RJ, Petrides M, Alivisatos B, Meyer E, Evans AC. (1999). Localization of cerebral activity during simple singing. NeuroReport 10, Petersson KM, Nichols TE, Poline JB, Holmes AP. (1999). Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models. Phil Trans R Soc Lond B 354, Petrides M, Pandya DN. (1988). Association fiber pathways to the frontal cortex from the superior temporal region in the rhesus monkey. J Comp Neurol 273, Platel H, Price C, Baron JC, et al. (1997). The structural components of music perception: a functional anatomical study. Brain 120, Poline JB, Worsley KJ, Holmes AP, Frackowiak RSJ, Friston KJ. (1995). Estimating smoothness in statistical parametric maps: variability of p values. J Comp Assist Tomogr 19, Polster MR, Rose SB. (1998). Disorders of auditory processing: evidence for modularity in audition. Cortex 34, Poremba A, Saunders RC, Crane AM, Cook M, Sokoloff L, Mishkin M. (2003). Functional mapping of the primate auditory system. Science 299, Poremba A, Malloy M, Saunders RC, Carson RE, Herscovitch P, Mishkin M. (2004). Species-specific calls evoke asymmetric activity in the monkey s temporal poles. Nature 427, Price CJ, Friston KJ. (1997). Cognitive conjunction: a new approach to brain activation experiments. NeuroImage 5, Price CJ, Veltman DJ, Ashburner J, Josephs O, Friston KJ. (1999). The critical relationship between the timing of stimulus presentation and data acquisition in blocked designs with fmri. NeuroImage 10,

244 288. Price CJ, Winterburn D, Giraud AL, Moore CJ, Noppeney U. (2003). Cortical localisation of the visual and auditory word form areas: a reconsideration of the evidence. Brain Lang 86, Rademacher J, Caviness V, Steinmetz H, Galaburda A. (1993). Topographical variation of the human primary cortices: implications for neuroimaging, brain mapping and neurobiology. Cereb Cortex 3, Rademacher J, Morosan P, Schormann T, et al. (2001). Probabilistic mapping and volume measurement of human primary auditory cortex. NeuroImage 13, Raichle ME, Grubb RLJ, Eichling JO, Pogossian MM. (1976). Measurement of brain oxygen utilization with radioactive oxygen-15: experimental verification. J Appl Physiol 40, Rao SM, Mayer AR, Harrington DL. (2001). The evolution of brain activation during temporal processing. Nature Neurosci 4, Rauschecker JP. (1998). Cortical processing of complex sounds. Curr Opin Neurobiol 8, Rauschecker JP, Tian B. (2000). Mechanisms and streams for processing of "what" and "where" in auditory cortex. Proc Natl Acad Sci USA 97, Rauschecker JP, Tian B, Hauser M. (1995). Processing of complex sounds in the macaque nonprimary auditory cortex. Science 268, Rauschecker JP, Tian B, Pons T, Mishkin M. (1997). Serial and parallel processing in rhesus monkey auditory cortex. J Comp Neurol 382, Ravicz ME, Melcher JR. (2001). Isolating the auditory system from acoustic noise during functional magnetic resonance imaging: examination of noise conduction through the ear canal, head, and body. J Acoust Soc Amer 109, Ravicz ME, Melcher JR, Kiang NY. (2000). Acoustic noise during functional magnetic resonance imaging. J Acoust Soc Amer 108, Read HL, Winer JA, Schreiner CE. (2002). Functional architecture of auditory cortex. Curr Op Neurobiol 12, Recanzone GH. (2000a). Response profiles of auditory cortical neurons to tones and noise in behaving macaque monkeys. Hear Res 150,

245 301. Recanzone GH. (2000b). Spatial processing in the auditory cortex of the macaque monkey. Proc Natl Acad Sci USA 97, Recanzone GH. (2002). Where was that? human auditory spatial processing. Trends Cogn Sci 6, Recanzone GH, Guard DC, Phan ML. (2000a). Frequency and intensity response properties of single neurons in the auditory cortex of the behaving macaque monkey. J Neurophysiol 83, Recanzone GH, Guard DC, Phan ML, Su TIK. (2000b). Correlation between the activity of single auditory cortical neurons and sound-localization behavior in the macaque monkey. J Neurophysiol 83, Rees G, Howseman A, Josephs O, et al. (1997). Characterizing the relationship between BOLD contrast and regional cerebral blood flow measurements by varying the stimulus presentation rate. NeuroImage 6, Riesenhuber M, Poggio T. (2002). Neural mechanisms of object recognition. Curr Opin Neurobiol 12, Rivier F, Clarke S. (1997). Cytochrome oxidase, acetylcholinesterase and NADPH-diaphorase staining in human supratemporal and insular cortex: evidence for multiple auditory areas. NeuroImage 6, Roland PE, Zilles K. (1994). Brain atlases a new research tool. Trends Neurosci 17, Romani GL, Williamson SJ, Kaufman L. (1982). Tonotopic organization of the human auditory cortex. Science 216, Romanski LM, Goldman-Rakic PS. (2002). An auditory domain in primate prefrontal cortex. Nature Neurosci 5, Romanski LM, Bates JF, Goldman-Rakic PS. (1999a). Auditory belt and parabelt projections to the prefrontal cortex in the rhesus monkey. J Comp Neurol 403, Romanski LM, Tian B, Fritz, J, Mishkin M, Goldman-Rakic, PS, Rauschecker, JP. (1999b). Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neurosci 2, Romanski LM, Tian B, Fritz JB, Miskin M, Goldman-Rakic PS, Rauschecker JP (2000). Reply to ''What', 'where' and 'how' in auditory cortex'. Nature Neurosci 3,

246 314. Rosen S. (1992). Temporal information in speech: acoustic, auditory and linguistic aspects. Phil Trans R Soc Lond B 336, Ross B, Picton TW, Pantev C. (2002). Temporal integration in the human auditory cortex as represented by the development of the steady-state magnetic field. Hear Res 165, Roy CS, Sherrington CS. (1890). On the regulation of the blood-supply of the brain. J Physiol 11, Sakai K, Hikosaka O, Miyauchi S, et al. (1999). Neural representation of a rhythm depends on its interval ratio. J Neurosci 19, Samson S, Zatorre RJ. (1994). Contribution of the right temporal lobe to musical timbre discrimination. Neuropsychologia 32, Samson S, Zatorre RJ, Ramsay JO. (2002). Deficits of musical timbre perception after unilateral temporal-lobe lesion revealed with multidimensional scaling. Brain 125, Sander K, Brechmann A, Scheich H. (2003). Audition of laughing and crying leads to right amygdala activation in a low-noise fmri setting. Brain Res Protocols 11, Saygin AP, Dick F, Wilson SW, Dronkers NF, Bates E. (2003). Neural resources for processing language and environmental sounds: evidence from aphasia. Brain 126, Scheich H, Baumgart F, Gaschler-Markefski B, et al. (1998). Functional magnetic resonance imaging of a human auditory cortex area involved in foreground-background decomposition. Eur J Neurosci 10, Schlaug G, Jäncke L, Huang Y, Steinmetz H. (1995). In vivo evidence of structural brain asymmetry in musicians. Science 267, Schneider P, Scherg M, Dosch HG, Specht HJ, Gutschalk A, Rupp A. (2002). Morphology of Heschl s gyrus reflects enhanced activation in the auditory cortex of musicians. Nature Neurosci 5, Schnider A, Benson DF, Alexander DN, Schnider-Klaus A. (1994). Non-verbal environmental sound recognition after unilateral hemispheric stroke. Brain 117, Schönwiesner M, Von Cramon YD, Rübsamen R. (2002). Is it tonotopy after all? NeuroImage 17,

247 327. Schroeder MR, Strube HW. (1986). Flat-spectrum speech. J Acoust Soc Amer 79, Scott SK, Johnsrude IS. (2003). The neuroanatomical and functional organization of speech perception. Trends Neurosci 26, Scott SK, Blank CC, Rosen S, Wise RJS. (2000). Identification of a pathway for intelligible speech in the left temporal lobe. Brain 123, Seifritz E, Esposito F, Hennel F, et al. (2002a). Spatiotemporal pattern of neural processing in the human auditory cortex. Science 297, Seifritz E, Neuhoff JG, Bilecen D, et al. (2002b). Neural processing of auditory looming in the human brain. Curr Biol 12, Seldon H. (1981). Structure of human auditory cortex. II. Axon distributions and morphological correlates of speech perception. Brain Res 229, Seltzer B, Pandya DN. (1989). Intrinsic connections and architectonics of the superior temporal sulcus in the rhesus monkey. J Comp Neurol 290, Semple MN, Scott BH. (2003). Cortical mechanisms in hearing. Curr Op Neurobiol 13, Shah NJ, Steinhoff S, Mirzazade S, et al. (2000). The effect of sequence repeat time on auditory cortex stimulation during phonetic discrimination. NeuroImage 12, Shamma S. (1999). Physiological basis of timbre perception. In: Gazzaniga MS (ed), Cognitive Neuroscience, MIT Press: Cambridge, pp Shapleske J, Rossell SL, Woodruff PWR, David AS. (1999). The planum temporale: a systematic, quantitative review of its structural, functional and clinical significance. Brain Res Rev 29, Sheffert SM, Pisoni DB, Fellowes JM, Remez RE. (2002). Learning to recognize talkers from natural, sinewave, and reversed speech samples. J Exp Psychol Hum Percept Perfom 28, Shepard RN. (1982). Structural representations of musical pitch. In: Deutsch D (ed), The Psychology of Music, Academic: New York, pp Simons JS, Lambon Ralph MA. (1999). The auditory agnosias. Neurocase 5,

248 341. Smith ZM, Delgutte B, Oxenham AJ. (2002). Chimaeric sounds reveal dichotomies in auditory perception. Nature 416, Sokoloff L. (1977). Relation between physiological function and energy metabolism in the central nervous system. J Neurochem 29, Stecker GC, Mickey BJ, Macpherson EA, Middlebrooks JC. (2003). Spatial sensitivity in field PAF of cat auditory cortex. J Neurophysiol 89, Steinschneider M, Reser DH, Fishman YI, Schroeder CE, Arezzo JC. (1998). Click train encoding in primary auditory cortex of the awake monkey: evidence for two mechanisms subserving pitch perception. J Acoust Soc Amer 104, Steinschneider M, Volkov IO, Noh MD, Garrell PC, Howard MA. (1999). Temporal encoding of the voice onset time phonetic parameter by field potentials recorded directly from human auditory cortex. J Neurophysiol 82, Stone JV. (2002). Independent component analysis: an introduction. Trends Cogn Sci 6, Strainer JC, Ulmer JL, Yetkin FZ, Haughton VM, Millen SJ. (1997). Functional magnetic resonance imaging of the primary auditory cortex: an analysis of pure tone activation and tone discrimination. Amer J Neuroradiol 18, Suga N, O Neill WE, Tanabe T. (1978). Cortical neurons sensitive to combinations of information-bearing elements of biosonar signals in the mustache bat. Science 200, Takahashi N, Kawamura M, Shinotou H, Hirayama K, Kaga K, Shindo M. (1992). Pure word deafness due to left hemisphere damage. Cortex 28, Talairach P, Tournoux J. (1988). A Stereotactic Coplanar Atlas of the Human Brain, Thieme: Stuttgart Talavage TM, Edmister WB, Ledden PJ, Weisskoff RM. (1999). Quantitative assessment of auditory cortex responses induced by imager acoustic noise. Hum Brain Mapp 7, Talavage TM, Ledden PJ, Benson RR, Rosen BR, Melcher JR. (2000). Frequency-dependent responses exhibited by multiple regions in human auditory cortex. Hear Res 150,

249 353. Talwar SK, Musial PG, Gerstein GL. (2001). Role of mammalian auditory cortex in the perception of elementary sound properties. J Neurophysiol 85, Tardif E., Clarke S. (2001). Intrinsic connectivity of human auditory areas: a tracing study with Dil. Eur J Neurosci 13, Tervaniemi M, Medvedev SV, Alho K, et al. (2000). Lateralized automatic auditory processing of phonetic versus musical information: a PET study. Hum Brain Mapp 10, Thierry G, Giraud AL, Price CJ. (2003). Hemispheric dissociation in access to the human semantic system. Neuron 38, Thivard L, Belin P, Zilbovicius M, Poline JB, Samson Y. (2000). A cortical region sensitive to auditory spectral motion. NeuroReport 11, Tian B, Rauschecker JP. (1995). FM-selectivity of neurons in the lateral areas of rhesus monkey auditory cortex. Soc Neurosci Abstr 21, Tian B, Reser D, Durham A, Kustov A, Rauschecker, JP. (2001). Functional specialization in rhesus monkey auditory cortex. Science 292, Toga A, Ambach K, Quinn B, Hutchin M, Burton J. (1994). Postmortem anatomy from cryosectioned whole human brain. J Neurosci Methods 54, Tootell RB, Reppas JB, Kwong KK, et al. (1995). Functional analysis of human cortical area MT and related visual cortical areas using magnetic resonance imaging. J Neurosci 15, Toronchuk JM, Stumpf E, Cynader MS. (1992). Auditory cortex neurons sensitive to correlates of auditory motion: underlying mechanisms. Exp Brain Res 88, Tramo MJ, Bharucha JJ, Musiek FE. (1990). Music perception and cognition following bilateral lesions of auditory cortex. J Cogn Neurosci 2, Tramo MJ, Shah GD, Braida LD. (2002). Functional role of auditory cortex in frequency processing and pitch perception. J Neurophysiol 87, Turner R, Howseman A, Rees GE, Josephs O, Friston KJ. (1998). Functional magnetic resonance imaging of the human brain: data acquisition and analysis. Exp Brain Res 123,

250 366. Tzourio N, El Massioui F, Crivello F, Joliot M, Renault B, Mazoyer B. (1997). Functional anatomy of human auditory attention studied with PET. NeuroImage 5, Ungerleider LG, Mishkin M. (1982). Two cortical visual systems. In: Ingle DJ, Goodale MA, Mansfield RJW (eds), Analysis of Visual Behavior, MIT Press: Cambridge, pp Vaadia E, Benson DA, Hienz RD, Goldstein MH. (1986). Unit study of monkey frontal cortex: active localization of auditory and of visual stimuli. J Neurophysiol 56, Van Lancker DR, Cummings JL, Kreiman J, Dobkin BH. (1988). Phonagnosia: a dissociation between familiar and unfamiliar voices. Cortex 24, Von Bonin G, Bailey P. (1947). The Neocortex of Macaca mulatta. University of Illinois Press: Urbana Von Economo C, Horn L. (1930). Űber Windungsrelief, Masse und Rindarchitektonik der Supratemporalflache, ihre Individuellen und ihre Seitenunterschiede. Z Gesamte Neurol Psychiatr 130, Von Kriegstein K, Eger E, Kleinschmidt A, Giraud AL. (2003). Modulation of neural responses to speech by directing attention to voices or verbal content. Cogn Brain Res 17, Vouloumanos A, Kiehl KA, Werker JF, Liddle PF. (2001). Detection of sounds in the auditory stream: event-related fmri evidence for differential activation to speech and nonspeech. J Cogn Neurosci 13, Walker AE. (1937). The projection of the medial geniculate body to the cerebral cortex in the macaque monkey. J Anat (London) 17, Wallace MN, Rutkowski RG, Palmer AR. (2002a). Interconnections of auditory areas in the guinea pig neocortex. Exp Brain Res 143, Wallace MN, Johnston PW, Palmer AR. (2002b). Histochemical identification of cortical areas in the auditory region of the human brain. Exp Brain Res 143, Walzl EM, Woolsey CN. (1943). Cortical auditory areas of the monkey as determined by electrical stimulation of nerve fibres in the osseous spiral lamina and by click stimulation. Fed Proc 2, Wang X. (2000). On cortical coding of vocal communication sounds in primates. Proc Natl Acad Sci USA 97,

251 379. Weeks RA, Aziz-Sultan A, Bushara KO, et al. (1999). A PET study of human auditory spatial processing. Neurosci Lett 262, Wessinger CM, Buoncore MH, Kussmaul CL, Mangun GR. (1997). Tonotopy in human auditory cortex examined with functional magnetic resonance imaging. Hum Brain Mapp 5, Wessinger CM, Van Meter J, Tian B, Van Lare J, Pekar J, Rauschecker JP. (2001). Hierarchical organization of the human auditory cortex revealed by functional magnetic resonance imaging. J Cogn Neurosci 13, Westbury CF, Zatorre RJ, Evans AC. (1999). Quantifying variability in the planum temporale: a probability map. Cereb Cortex 9, Whitfield IC. (1985). The role of auditory cortex in behavior. In: Peters A, Jones EG (eds), Cerebral Cortex, Vol 4, Association and Auditory Cortices, Plenum Press: New York, pp Wichmann FA, Hill NJ. (2001a). The psychometric function: I. Fitting, sampling, and goodness of fit. Percept Psychophys 63, Wichmann FA, Hill NJ. (2001b). The psychometric function: II. Bootstrapbased confidence intervals and sampling. Percept Psychophys 63, Wightman FL, Kistler DJ. (1989). Headphone simulation of free-field listening. I: Stimulus synthesis. J Acoust Soc Amer 85, Wightman FL, Kistler DJ. (1998). Of vulcan ears, human ears and earprints. Nature Neurosci 1, Wise RJS, Scott SK, Blank SC, Mummery CJ, Murphy K, Warburton EA. (2001). Separate neural subsystems within Wernicke s area. Brain 124, Woods RP, Cherry SR, Mazziotta JC. (1992). Rapid automated algorithm for aligning and reslicing PET images. J Comput Assist Tomogr 16, Worsley KJ, Friston KJ. (1995). Analysis of fmri time-series revisited again. NeuroImage 2, Worsley KJ, Marrett S, Neelin P, Vandal AC, Friston KJ, Evans AC. (1996). A unified statistical approach for determining significant signals in images of cerebral activation. Hum Brain Mapp 4, Wright AA, Rivera JJ, Hulse SH, Shyan M, Neiworth JJ. (2000). Music perception and octave generalization in rhesus monkeys. J Exp Psychol Gen 129,

252 393. Yang Y, Engelien A, Engelien W, Xu S, Stern E, Silbersweig DA. (2000). A silent event-related functional MRI technique for brain activation studies without interference of scanner acoustic noise. Magn Reson Med 43, Yeterian EH, Pandya DN. (1989). Thalamic connections of the cortex of the superior temporal sulcus in the rhesus monkey. J Comp Neurol 282, Yost WA, Patterson RD, Sheft S. (1996). A time domain description for the pitch strength of iterated rippled noise. J Acoust Soc Amer 99, Yukie M. (2002). Connections between the amygdala and auditory cortical areas in the macaque monkey. Neurosci Res 42, Yvert, B., Crouzeix, A., Bertrand, O., Seither-Preisler, A., and Pantev, C. (2001). Multiple supratemporal sources of magnetic and electric auditory evoked middle latency components in humans. Cereb Cortex 11, Zahn R, Huber W, Drews E, et al. (2000). Hemispheric lateralization at different levels of human auditory word processing: a functional magnetic resonance imaging study. Neurosci Lett 287, Zakarauskas P, Cynader MS. (1991). Aural intensity for a moving source. Hear Res 52, Zatorre RJ. (1985). Discrimination and recognition of tonal melodies after unilateral cerebral excisions. Neuropsychologia 23, Zatorre RJ. (1988). Pitch perception of complex tones and human temporal lobe function. J Acoust Soc Amer 84, Zatorre RJ. (2003). Sound analysis in auditory cortex. Trends Neurosci 26, Zatorre RJ, Belin P. (2001). Spectral and temporal processing in human auditory cortex. Cereb Cortex 11, Zatorre RJ, Halpern AR. (1993). Effect of unilateral temporal-lobe excision on perception and imagery of songs. Neuropsychologia 31, Zatorre RJ, Penhune VB. (2001). Spatial localization after excision of human auditory cortex. J Neurosci 21, Zatorre RJ, Evans AC, Meyer E, Gjedde A. (1992). Lateralization of phonetic and pitch discrimination in speech processing. Science 256,

253 407. Zatorre RJ, Evans AC, Meyer E. (1994). Neural mechanisms underlying melodic perception and memory for pitch. J Neurosci 14, Zatorre RJ, Halpern AR, Perry DW, Meyer E, Evans AC. (1996). Hearing in the mind s ear: a PET investigation of musical imagery and perception. J Cogn Neurosci 8, Zatorre RJ, Perry DW, Beckett CA, Westbury CF, Evans AC. (1998). Functional anatomy of musical processing in listeners with absolute pitch and relative pitch. Proc Natl Acad Sci USA 95, Zatorre RJ, Mondor TA, Evans AC. (1999). Auditory attention to space and frequency activates similar cerebral systems. NeuroImage 10, Zatorre RJ, Belin P, Penhune VB. (2002a). Structure and function of auditory cortex: music and speech. Trends Cogn Sci 6, Zatorre RJ, Bouffard M, Ahad P, Belin P. (2002b). Where is where in the human auditory cortex? Nature Neurosci 5, Zatorre, R.J., Bouffard, M., Belin, P. (2004). Sensitivity to auditory object features in human temporal neocortex. J Neurosci 24, Zilles K, Palomero-Gallagher N, Grefkes C, et al. (2002). Architectonics of the human cerebral cortex and transmitter receptor fingerprints: reconciling functional neuroanatomy and neurochemistry. Eur Neuropsychopharmacol 12, Zimmer U, Lewald J, Karnath HO. (2003). Disturbed sound lateralization in patients with spatial neglect. J Cogn Neurosci 15,

254 Appendices 187

255 Appendix I. Examples of Matlab scripts 188

256 % SCRIPT TO MAKE STIMULUS SEQUENCES FOR EXPERIMENT 1 % Script to deliver iterated ripple noise (IRN) pitch sequence at % different spatial locations OR a given spatial sequence with different % pitch values % Broad band sounds with pitch created as spatio-temporal.wav files % "starting_pos(1:q) comb(1:i) perm(1:k).wav" % Matrices M(1:i) are saved to disc as "stim_matrices.mat" to check % combinations/permutations (identical for each start position) % For M matrices first n columns give pitch, next n give duration, % next n give temporal gap (ignore nth), and next n give spatial jump % (ignore nth) % This script calls generic head_related transfer functions (HRTFs) % provided by F. Wightman and D. Kistler of the University of Wisconsin; % (Wightman & Kistler, 1989,1998) % TO USE 1: COPY THIS SCRIPT TO NEW DIRECTORY % 2: THE NEW DIRECTORY SHOULD BE A SUBDIRECTORY OF THE FOLDER % CONTAINING HRTF SCRIPTS AND WIND.m GATING FUNCTION % 3: LOAD HRTF % input sequence parameters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% display('default values are shown in { } - hit <Return> to accept') n = input('enter number elements in sequence (n) {6} :'); if isempty(n), n = 6; end srate = input('enter sample rate in Hz) {44100} :'); if isempty(srate), srate = 44100; end amp = input('enter RMS level {0.016} :'); if isempty(amp), amp = 0.016; end wdms = input('enter gating window milliseconds {20} :'); if isempty(wdms), wdms = 20; end azimuth_start = input('enter start azimuths in degrees as row vector in'... 'square brackets (no default) '); % starting positions to base permutations of spatial jumps no_comb = input('enter number of combinations - same at each start'... 'azimuth (no default) '); no_perm = input('enter number of permutations for each combination'... ' same at each start azimuth (no default) '); data1(1:n-1) = input('enter pitch values (Hz) as row vector of size n-1'); data1(n) = data1(1); smode2=input('do you want duration to be (1) fixed, (2) variable?'); if smode2 == 1 data2(1:n) = input('enter duration in ms (use >300ms for good IRN pitch)'); end

257 smode3=input('do you want time gap between sounds to be (1) fixed,'... '(2) variable? '); if smode3 == 1 data3(1:n-1) = input('enter gap in ms (e.g. 50ms)'); data3(n) = 0; end smode4=input('do you want spatial gap to be (1) fixed, (2) variable?'); if smode4 == 1 data4(1:n-1) = input('enter gap in degrees (no default)'); data4(n) = data4(1); end % make sequences %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % enter values for pitch and spatial locations %%%%%%%%%%%%%%%%%%%%%%%%%% for i = 1: no_comb if smode2 == 2 data2 = input('enter duration matrix (ms) as row vector in square'... ' brackets of size (n-1)'); data2 = [ data2 data2(1) ]; end if smode3 == 2 data3 = input('enter time gap matrix (ms) as row vector in square'... ' brackets of size (n-1)'); data3 = [ data3 0 ]; end if smode4 == 2 data4 = input('enter spatial gap (degree) matrix as row vector'... ' in square brackets of size (n-1)'); data4 = [ data4 0 ]; end % make matrices of parameter values %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Pitch_Matrix = []; Dur_Matrix = []; Gap_Matrix = []; Space_Matrix = []; for k = 1:no_perm for q = 1:length(azimuth_start); if q == 1 x = 1:(n-2); y = randperm(n-2); z = randperm(n-1); zz = randperm(n-1); end

258 pitch = [ data1(1) data1(x+1) data1(1) ]; dur = [ data2(1) data2(y+1) data2(1) ]; gap = [ data3(z) 0 ]; space = [ data4(zz) 0 ]; % calculate absolute azimuthal positions for the HRTFs azimuth_absolute(1) = azimuth_start(q); for v = 2:n azimuth_absolute(v) = azimuth_start(q) + sum(space(1:v-1)); end % save permutation matrices for each iteration if q == 1 Pitch_Matrix = [ Pitch_Matrix; pitch]; Dur_Matrix = [ Dur_Matrix ; dur ]; Gap_Matrix = [ Gap_Matrix; gap ]; Space_Matrix = [ Space_Matrix ; space ]; eval (['M' num2str(i) '= [ Pitch_Matrix Dur_Matrix Gap_Matrix '... ' Space_Matrix ];']); end % make signal corresponding to these values %%%%%%%%%%%%%%%%%%%%%%%%%%%% sig =[]; for p = 1:n t = [0:1/srate:dur(p)/1000]; %time vector % code to make IRN with 16 iterations, delay determined by pitch % and gain of 1 %for each sound (after Yost et al., 1996) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% del = 1000/pitch(p); g = 1; no = 16; d = (length(t)/srate); rsamp = round(length(t) + del*no*srate/1000); % number of samples for % input noise % slightly longer than t to avoid circular iterations delsamp = round(del/1000*srate); noise = randn(1,rsamp); % create input noise for w = 1:1:no % iterate delay and add n times dnoise(delsamp+1:rsamp) = noise(1:rsamp-delsamp); dnoise(1:delsamp) = noise(rsamp-delsamp+1:rsamp); noise = noise + g*dnoise; clear dnoise end s = noise(1:length(t)); s = amp * s/std(s); tmpwav = wind(srate,wdms,s); % End of IRN code % take first bit of result as IRN % fix RMS level noise % apply gating function

259 % HRTF code to convolve the IRN made as 2 column matrix wav %%%%%%%%%%%%% % Use the column p of azimuth_absolute for parameter ap = azimuth_absolute(p); if(ap > 180), ap = ap - 360; end if(ap < -170), ap = ap + 360; end pos = ((180-ap)/10*14+(80-0)/10+1)+1; %Choose correct filter for pos'n yleft = filter(hrtfl(pos,:),[1],tmpwav); yright =filter(hrtfr(pos,:),[1],tmpwav); yleft = amp * yleft; yright = amp * yright; wav=[yleft',yright']; % End of HRTF code % concatenate successive elements in sequence pause = zeros(1, floor(srate * gap(p)/1000)); pause = [ pause', pause' ]; sig = [ sig; wav; pause ]; end % write wavefile corresponding to signal %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% wavwrite(sig, srate, [ 'start' num2str(q) 'c' num2str(i) 'p' num2str(k) ]); end end end save stim_matrices M* azimuth_start % save matrices

260 % SCRIPT TO MAKE STIMULUS SEQUENCES FOR EXPERIMENT 3 % Script makes pitch chroma/height 'tunes' for factorial 2X2 design plus noise %control % Name key: tune1 = CFHF; tune2 = CVHV; tune3 = CVHF; tune4 = CFHV; % Randomises pitch chroma on chromatic scale in range specified % Randomises pitch height over one octave in roughly equal steps % Pitch height manipulations modified after R. Patterson, %University of Cambridge %(Patterson, 1990; Patterson et al., 1993); T Griffiths / S. Uppenkamp, %Feb 2002 % Calls function freqflter to filter (noise) signal in frequency domain %with zero phase delay (sample rate, high_pass, low_pass, input vector) % TO USE: COPY THIS SCRIPT TO NEW DIRECTORY clear all; close all; % input stimulus parameters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% srate = 44100; % sampling rate (Hz) runs = input('how many examples of each stimulus? '); dur = input('enter duration each note in milliseconds: '); notes = input('enter number of notes: '); chroma_min = input('enter repetition rate (chroma) min (Hz): '); chroma_max = input('enter repetition rate (chroma) max (Hz): '); upper_frequency = input('enter upper frequency (highest harmonic) in Hz: '); % calculate time and frequency base dur = dur/1000; t = 0:1/srate:dur-1/srate; % time vector % number of samples should be even if mod(length(t),2)~=0, t=0:1/srate:dur; end npts = length(t); fbin = srate/npts; % width of frequency bin % generate 20ms on/offset ramps for individual notes in sequence %%%%%%%%%% window_dur = round(srate/1000) * 40; window = hanning(window_dur); gate1 = window(1:window_dur/2); gate3 = window(window_dur/2+1:window_dur); gate2 = ones(1, npts - window_dur); gate = [ gate1' gate2 gate3' ]; % generate envelope

261 % calculate number of semitones between min and max %%%%%%%%%%%%%%%%%%%%%%%% % semitones = floor(12*log2(chroma_max/chroma_min)) + 1; % Pitch height manipulation %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Build randomised amplitudes for 1st to 4th harmonics % On higher harmonics, these values have been chosen to achieve a smooth % transition in tone height covering roughly one octave. % Start with attenuation values in db for 1st to fourth harmonics; use % repeatedly for higher harmonics % NB. attenuation of 2nd but not 4th harmonic in lower half matrix enters % next octave atten_var = [ ]; % convert from db attenuation ampl_var = 10.^(atten_var/(-20.)); % make stimulus sequences %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for r = 1:runs % create sequences for CVHV and CFHF %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % assign fixed and variable amplitudes (harmonics 1-4), order randomised for i = 1:notes ampl_rand(i,:) = ampl_var(mod(round(rand*10000),12)+1,:); ampl_fix(i,:) = [ ]; end % assign fixed and variable chroma, order randomised, chosen from % semitones within range given above chroma_fix = chroma_min * 2^(floor(rand*semitones)/12) * ones(1,notes); n_of_harm_fix = round(upper_frequency./chroma_fix); chroma_rand = chroma_min * 2.^(floor(rand(1,notes)*semitones)/12); n_of_harm_rand = round(upper_frequency./chroma_rand); tune1 = [ampl_fix';chroma_fix]; tune2 = [ampl_rand';chroma_rand/sqrt(2)]; % shift chroma by half an octave % for rand amplitudes to be roughly % same pitch range

262 % create sequences for CFHV and CVHF %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % assign fixed and variable amplitudes (harmonics 1-4), order randomised for i = 1:notes ampl_rand(i,:) = ampl_var(mod(round(rand*10000),12)+1,:); ampl_fix(i,:) = [ ]; end; % assign fixed and variable chroma, order randomised, chosen from semitones % within range given chroma_fix = chroma_min * 2^(floor(rand*semitones)/12) * ones(1,notes); n_of_harm_fix = round(upper_frequency./chroma_fix); chroma_rand = chroma_min * 2.^(floor(rand(1,notes)*semitones)/12); n_of_harm_rand = round(upper_frequency./chroma_rand); tune3 = [ampl_fix';chroma_rand]; tune4 = [ampl_rand';chroma_fix/sqrt(2)]; % shift chroma by half an octave % for rand amplitudes to be roughly % in the same pitch range % make signals based on these values %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for comb = 1:4 eval(['tune = tune'(num2str(comb)) ';']); wav = []; noise = []; for pp = 1:notes % make a flat energy spectrum mag = zeros(1,npts/2-1); % magnitudes odd/even harmonics mag([1*floor(tune(5,pp)/fbin):4*floor(tune(5,pp)/fbin):upper_frequency/fbin]) = tune(1,pp) ; mag([2*floor(tune(5,pp)/fbin):4*floor(tune(5,pp)/fbin):upper_frequency/fbin]) = tune(2,pp) ; mag([3*floor(tune(5,pp)/fbin):4*floor(tune(5,pp)/fbin):upper_frequency/fbin]) = tune(3,pp) ; mag([4*floor(tune(5,pp)/fbin):4*floor(tune(5,pp)/fbin):upper_frequency/fbin]) = tune(4,pp) ; % set phase to zero phase = zeros(1,npts/2-1); % write signal allphase = [ 0 phase 0-1.*fliplr(phase) ]; allmag = [ 0 mag 0 fliplr(mag) ]; rect = allmag.* exp(i.* allphase); sig = real(ifft(rect)); % set RMS value to 0.05 and gate sig = sig / (20*std(sig)); sig = sig.* gate;

263 % concatenate successive elements in sequence wav = [ wav sig ]; % make noise control noise_note = randn(1, length(sig)); % call function freqfilter noise_note = freqfilter(srate,1, upper_frequency, noise_note); noise_note = noise_note/(20 *std(noise_note)); noise_note = noise_note.* gate; noise = [noise noise_note]; end % write wavefile corresponding to signals wavwrite(wav,srate,['tune' num2str(comb) 'example' num2str(r)]); wavwrite(noise,srate,['noise' 'example' num2str(r)]); end end

264 % SCRIPT TO MAKE ALTERNATING NOISE:HARMONIC STIMULUS SEQUENCES FOR EXPERIMENT 4 % Script creates alternating noise : harmonic sequences with / without spectral % shape variation % Applies user-specified spectral shape (frequency domain) % Pitch of harmonic elements is always fixed in this version % TO USE: COPY THIS SCRIPT TO NEW DIRECTORY ALSO CONTAINING WIND.m GATING % FUNCTION clear all; close all; % input stimulus parameters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% srate=44100; % sampling rate n = input('enter number elements in sequence (n):'); % characteristics of each element in sequence dur = input('enter duration of each element (seconds)'); no = input('enter number of harmonics in each harmonic complex'); freq1 = input('enter FIVE frequencies for base spectral envelope as vector '... '[f1 f2 f3 f4 f5]'); envs1 = input('enter relative base attentuation (db) at each frequency point'... ' [a1 a2 a3 a4 a5] in reference spectral envelope'); %temporal envelope time = input('enter timepoints (s) for time envelope as vector [t1 t2 t3 t4'... '..tn]'); envt = input('enter envelope (0-1) at each timepoint [s1 s2 s3 s4...sn]'); % specify number of sequence permutations no_perm = input('enter number of iterations for each set of fundamentals and'... ' envelope types'); data1 = []; data2 = []; data 1 = []; data 2 = []; % make matrix of frequencies (f0) in successive trials %%%%%%%%%%%%%%%%%%%%%%% data_1 = input('enter vector of fundamental frequencies (Hz) in successive'... ' trials (always fixed within a trial)'); for i = 1:length(data_1) data 1 = [data 1; data_1]; end data 1 = data 1'; blocks1 = n/length(data_1); %length of pitch vector

265 % make matrix of number-coded spectral shapes %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% smode1 = input('do you want spectral shape to be (1) fixed (2) variable?'); if smode1 == 1 % Case fixed spectral shape data_2 = input('enter vector of spectral shapes in successive trials: '... '1, reference trapezoid; 2, M1; 3, vary upslope;'... '4, vary downslope'); for i = 1:length(data_2) data 2 = [data 2; data_2]; end data 2 = data 2'; for q = 1:length(data 1) for p = 1:length(data 2) data1 = [data1; data 1(q,:)]; data2 = [data2; data 2(p,:)]; end end blocks2 = n/length(data_2); %length of shape vector end if smode1 == 2 % Case variable spectral shape data2 = input('enter spectral shape matrix as row vector in square brackets'... ' [should equal length f0 vector]: 1, reference trapezoid; '... '2, M1; 3, vary upslope; 4, vary downslope '); for q = 1:length(data 1) for p = 1:length(data2) data1 = [data1; data 1(q,:)]; end end blocks2 = n/length(data2); %length of shape vector end % ensure lengths of pitch and shape vectors equalised for final stimuli if blocks1 > blocks2 blocks = blocks1; elseif blocks1 < blocks2 blocks = blocks2; else blocks = blocks2; end

266 % assign spectral shape parameters %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % M1 shape if (any(data2 == 2) > 0) step1 = input('enter attenuation step for middle of "M" [db] ' ); end % upslope shape if (any(data2 == 3) > 0) step3 = input('enter intensity step of upslope spectral envelope [db]'); end % downslope shape if (any(data2 == 4) > 0) step4 = input('enter intensity step of downslope spectral envelope [db]'); end % assign order of spectral shapes for each sequence %%%%%%%%%%%%%%%%%%%%%%% Pitch_Matrix = []; Shape_Matrix =[]; for k = 1:no_perm pitch = []; shape = []; for b = 1:blocks sig_block = []; noise_block = []; y = randperm(length(data2)); %randomise spectral shape order %across conditions pitch_block = [data1(k,:)]; if smode1 == 1 shape_block = [data2(k,:)]; else shape_block = [data2(y)]; end if b >1 % ensure same shape not repeated % on successive elements in seqence if shape((b-1)*(n/blocks)) == shape_block(1) shape_block(1) = shape_block(2); shape_block(2) = shape((b-1)*(n/blocks)); end end pitch = [ pitch pitch_block ]; shape = [ shape shape_block ]; end Pitch_Matrix = [ Pitch_Matrix; pitch ]; %build final pitch / shape matrices Shape_Matrix = [ Shape_Matrix; shape ]; end

267 %indicate alternate noise % elements in pitch matrix %(coded as '0') Pitch_Matrix(:,2:2:length(Pitch_Matrix(1,:))) = 0; % display final matrices Pitch_Matrix Shape_Matrix % make sequences %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for k = 1:no_perm sig4 = []; for p = 1:n % make harmonic elements %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% if rem(p,2) ~= 0 base = Pitch_Matrix(k,p); envs = envs1; freq = freq1; % make harmonic objects for each type of spectral shape if (Shape_Matrix(k,p) == 1) envs = envs; end if (Shape_Matrix(k,p) == 2) envs(3) = (envs(3) + step1); end if (Shape_Matrix(k,p) == 3) envs(1) = (envs(1) - step3); end if (Shape_Matrix(k,p) == 4) envs(5) = (envs(5) - step4); end % reference trapezoid % M shape % alter up-slope % alter down-slope t=0:1/srate:dur-1/srate; % time vector % number of samples should be even if mod(length(t),2)~=0, t=0:1/srate:dur; end npts=length(t); fbin=srate/npts; % width of frequency bin % make a flat energy spectrum mag1 = zeros(1,npts/2-1);

268 % flat harmonic series mag1([floor(base/fbin):floor(base/fbin):no* floor(base/fbin)]) = 1; % convert the frequency vector to bin numbers freq = floor(freq/fbin); % make spectrum spectrum = zeros(1, npts/2-1); x = [1:npts/2-1]; spectrum(freq(1):freq(length(freq)))= interp1(freq, envs, x(freq(1):freq(length(freq)))) ; % assume non-specified spectrum has zero energy % convert attenuation vector into intensity values spectrum(freq(1):freq(length(freq)))= 10.^ -(spectrum(freq(1):freq(length(freq)))/20); % convolve mag1 = mag1.*spectrum; % phase harmonics phase1 = zeros(1,npts/2-1); % set phase to avoid high peak factor (after Schroeder & Strube, 1986) for tt = 1:no phase1(tt * floor(base/fbin)) = pi *floor((tt^2)/(2*no)); end % write signal allphase1 = [ 0 phase1 0-1.*fliplr(phase1) ]; allmag1 = [ 0 mag1 0 fliplr(mag1) ]; rect1 = allmag1.*exp(j.*allphase1); sig1 = real(ifft(rect1)); %time envelope time = floor(time*srate); time = time/srate; envelope = interp1(time, envt, t) ; sig1 = sig1.* envelope; % set RMS value to 0.1 sig1 = sig1/(10*std(sig1)); sig1s = sig1'; sig3 = sig1s; end % make noise elements %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% if rem(p,2) == 0 envs = envs1; freq = freq1;

269 %make noise elements for each type of spectral shape if (Shape_Matrix(k,p) == 1) envs = envs; end if (Shape_Matrix(k,p) == 2) envs(3) = (envs(3) + step1); end if (Shape_Matrix(k,p) == 3) envs(1) = (envs(1) - step3); end if (Shape_Matrix(k,p) == 4) envs(5) = (envs(5) - step4); end % reference trapezoid % M-shape % alter up-slope % alter down-slope t=0:1/srate:dur-1/srate; % time vector % number of samples should be even if mod(length(t),2)~=0, t=0:1/srate:dur; end npts=length(t); fbin=srate/npts; % width of frequency bin % make a flat energy spectrum mag2 = zeros(1,npts/2-1); % flat equivalent noise mag2(1:no * floor(base/fbin)) = 1 ; % convert the frequency vector to bin nos freq = floor(freq/fbin); % make spectrum spectrum = zeros(1, npts/2-1); x = [1:npts/2-1]; spectrum(freq(1):freq(length(freq)))= interp1(freq, envs, x(freq(1):freq(length(freq)))) ; % assume non-specified spectrum has zero energy % convert attenuation vector into intensity values spectrum(freq(1):freq(length(freq)))= 10.^ -(spectrum(freq(1):freq(length(freq)))/20); % convolve mag2 = mag2.*spectrum; % set phase phase2 = 2*pi*rand(1,npts/2-1);

270 % write signal allphase2 = [ 0 phase2 0-1.*fliplr(phase2) ]; allmag2 = [ 0 mag2 0 fliplr(mag2) ]; rect2 = allmag2.*exp(j.*allphase2); sig2 = real(ifft(rect2)); %time envelope time = floor(time*srate); time = time/srate; envelope = interp1(time, envt, t) ; sig2 = sig2.* envelope; % set RMS value to 0.1 sig2=sig2/(10*std(sig2)); sig2s = sig2'; sig3 = sig2s; end % concatenate successive elements in sequence sig4 = [sig4; sig3]; % write wavefile corresponding to signal wavwrite(sig4, srate, ['shape_alt_seq' num2str(k)]); end end

271 % SCRIPT TO RUN fmri STIMULUS DELIVERY AND DATA ACQUISITION FOR EXPERIMENT 4 % Run under COGENT2000 ( with path set to toolbox % and stimuli in same directory function shape_scan(filename); addpath toolbox; % path to Cogent subroutines % Initialise COGENT subroutines %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clc % clear command window config_sound (2, 16, 44100, 110); % Initialise COGENT sound library (nchannels, % nbits, frequency, nbuffers) config_display % Initialise COGENT display library config_log(filename) config_keyboard config_serial % configure serial port for scanner pulses start_cogent % Use n = 112 trials plus dummy scans / run % 15-element sequences * 48 and 16-element sequences * 48 each run % 16 harmonic PFTF, 16 harmonic PVTF, 16 harmonic PFTV, 16 harmonic PVTV, % 16 noise-harmonic PFTF, 16 noise-harmonic PFTV, 16 silence in each run % Key to trial sequence - within a condition, trials are numbered strictly % according to numbering in original file key [see timbre_stimuli.doc] % harmonic PFTF files 1-16 % harmonic PVTF files % harmonic PFTV files % harmonic PVTV files % noise-harmonic PFTF files % noise-harmonic PFTV files % 'silence' file 97 with 16 repeats % Prepare trial sequence %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 112 stimuli per run SeqTrials = [1: ]; % randomise trial order across run SeqTrials = SeqTrials(randperm(112)); % Load sound into buffers %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for q = 1:96 loadsound ([ 'S' num2str(q) '.wav' ], q ); end

272 % Clear projector screen and present fixation cross %%%%%%%%%%%%%%%%%%%%%%%%%% clearpict(1) % clear screen preparestring('+', 1, 0, 0); % load cross into buffer 1, positioned at centre % of screen drawpict(1) % display buffer % code scanner slices for stimulus triggering %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % wait 96 slices = 2 whole brain acquisitions = 2 'dummy' scans % then trigger stimulus delivery every 48 slices = 1 whole brain acquisition slices=(96:48:7000); % Run experiment %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for i = 1 : length(seqtrials) % Wait scanner code (or 2 seconds in case filename = test - allows % offline piloting) if (strcmp(filename,'test')) pause(2) else [s,t] = waitslice(1, slices(i)) % wait trigger slice from scanner logslice % log scanner slice received end % Pause for duration of 'silence' trial if SeqTrials(i) == 97 pause(8.18) % duration of gap (seconds) between brain acquisitions % (for total inter-scan interval = 12.5 seconds) % Play sounds according to randomised trial order else playsound(seqtrials(i)); waitsound(seqtrials(i)); % wait until buffer has stopped playing end % save trial order for run eval(['save ' filename ' SeqTrials']); % read and log subject key presses for offline analysis % (here used to help maintain attention only) readkeys; logkeys; end % Close at end of run stop_cogent clear all

273 Appendix II. Division of labour for experimental work Experiment 1 The author was involved in designing the fmri paradigm and stimuli, acquired and analysed all experimental data, and drafted the published paper (see Appendix 3). Tim Griffiths (Clinical Auditory Laboratory, University of Newcastle-upon-Tyne and Wellcome Department of Imaging Neuroscience, Institute of Neurology, London) was involved in designing the fmri paradigm and stimuli, and in writing the published paper. Experiment 2 The author was involved in designing the fmri and psychophysical paradigms and stimuli, acquired and analysed all experimental data, and drafted the published paper (see Appendix 3). Tim Griffiths was involved in designing the fmri paradigm and stimuli, and in writing the published paper. Experiment 3 The author was involved in designing the fmri and psychophysical paradigms and stimuli, acquired and analysed all experimental data, and drafted the published paper (see Appendix 3). Tim Griffiths was involved in designing the fmri and psychophysical paradigms and stimuli, in analysing fmri and psychophysical data, and in writing the published paper. Roy Patterson (Centre for the Neural Basis of Hearing, University of Cambridge) was involved in stimulus development, in the design of the psychophysical experiment, and in writing the published paper. Stefan Uppenkamp (Centre for the Neural Basis of Hearing, University of Cambridge) wrote the Matlab script to implement pitch height manipulations. 189

274 Experiment 4 The author was involved in designing the fmri and psychophysical paradigms and stimuli, acquired and analysed all experimental data, and drafted the published paper (see Appendix 3). Tim Griffiths was involved in designing the fmri and psychophysical paradigms and stimuli, in the analysis of fmri data, and in writing the published paper. Amanda Jennings (Clinical Auditory Laboratory, University of Newcastle-upon-Tyne) was involved in the initial psychophysical studies that motivated the fmri experiment. 190

275 Appendix III. Publications arising from this thesis 1. Warren JD, Zielinski BA, Green GGR, Rauschecker JP, Griffiths TD. Perception of sound source motion by the human brain. Neuron 2002; 34: Griffiths TD, Warren JD. The planum temporale as a computational hub. Trends in Neurosciences 2002; 25: Warren JD, Griffiths TD. Distinct mechanisms for processing spatial sequences and pitch sequences in the human auditory brain. Journal of Neuroscience 2003; 23: Warren JD, Uppenkamp S, Patterson RD, Griffiths TD. Separating pitch chroma and pitch height in the human brain. Proceedings of the National Academy of Sciences of the USA 2003; 100: Warren JD, Uppenkamp S, Patterson RD, Griffiths TD. Analyzing pitch chroma and pitch height in the human brain. In: Avanzini G, Faienza C, Minciacchi D (Eds) Annals of the New York Academy of Sciences: The Neurosciences and Music 2003; 999: Warren JD, Jennings AR, Griffiths TD. Analysis of the spectral envelope of sounds by the human brain. NeuroImage 2005; 24:

276 Appendix IV. Reprints of published work 192

277 Neuron, Vol. 34, , March 28, 2002, Copyright 2002 by Cell Press Perception of Sound-Source Motion by the Human Brain Jason D. Warren, 1,2 Brandon A. Zielinski, 3 Gary G.R. Green, 2 Josef P. Rauschecker, 3 moving stimulus and the appropriate control stimulus (stationary sound) has shown activation of the primary auditory cortex in the region of medial Heschl s gyrus 1 Wellcome Department of Imaging Neuroscience (HG) (Penhune et al., 1996; Rademacher et al., 2001). Institute of Neurology This argues against the specific involvement of the pri- University College London mary auditory cortex (A1) in sound-movement percep- 12 Queen Square tion. Demonstration of neurons sensitive to cues for London WC1N 3BG auditory motion in primary auditory cortex of cats and United Kingdom monkeys (Ahissar et al., 1992; Toronchuk et al., 1992) 2 Auditory Group does not invalidate this conclusion; such neurons may Newcastle University Medical School provide part of the input to movement-specific areas. Framlington Place Three previous studies have shown activation during Newcastle-upon-Tyne NE2 4HH sound-movement processing in the planum temporale United Kingdom (PT), the region of the superior temporal plane posterior 3 Georgetown Institute for Cognitive and to HG (Baumgart et al., 1999; Lewis et al., 2000; Bremmer Computational Sciences et al., 2001). This raises the possibility of a posterior Georgetown University Medical Center processing stream for analysis of sound movement in 3970 Reservoir Road space, passing from A1 to PT and, thence, to the inferior Washington, D.C parietal lobule (IPL). A similar scheme was first proposed in the macaque, in which spatial information about sound sources is processed in a pathway that runs from Summary primary auditory cortex via adjacent caudal superior temporal areas to the parietal lobe (Rauschecker, 1998; We assessed the human brain network for sound- Rauschecker and Tian, 2000). However, Baumgart et al. motion processing using the same virtual stimulus in (1999) used a limited number of slices that did not allow three independent functional imaging experiments. All a demonstration of the entire motion analysis system, experiments show a bilateral posterior network of actia while Lewis et al. (2000) and Bremmer et al. (2001) used vation, including planum temporale (PT) and parietosions silent reference condition that does not allow conclu- temporal operculum (PTO). This was demonstrated in to be drawn about specific movement analysis contrasts between sound movement and two control mechanisms. Moreover, all three studies used acoustic conditions: externalized stationary stimuli (in the midproduce the perception of movement of a sound object stimuli that relied on the manipulation of binaural cues to line or to the side of the head) and midline sounds within the head with similar spectro-temporal strucbe produced by actual sounds in space. between the ears, rather than acoustic stimuli that would ture. We suggest specific computational mechanisms in PT for disambiguation of the intrinsic spectro-temnique (Wightman and Kistler, 1989) to produce the per- The present study uses a virtual acoustic space techporal features of a sound and the spectro-temporal cept of a single sound source moving around the head. effect of sound movement. The results support the We used whole-brain imaging with positron emission existence of a posteriorly directed temporo-parietal tomography (PET) (Experiment 1) and functional magpathway for obligatory perceptual processing of soundnetic resonance imaging (fmri) (Experiment 2) to test source motion. the hypothesis that PT is part of a posterior network for Introduction the processing of movement in acoustic space. Apart from providing a parallel independent test of the hypothesis regarding PT, the increased spatial resolution of Sound movement is an important aspect of our percepfmri allows a search for functional subdivisions within tion of the environment and is the only sensory cue this large anatomical area. A secondary aim of Experiavailable for the perception of movement of objects in ments 1 and 2 was to compare first-order sound motion, the large region of space behind the head. Previous where the acoustic object moves with a fixed angular functional imaging studies (Table 1) have demonstrated velocity, and second-order sound motion, where the brain activation during presentation of moving sound acoustic object moves with changing angular velocity. stimuli in humans (Baumgart et al., 1999; Griffiths et al., A common example of the latter situation occurs with 1994; Griffiths et al., 1998a; Griffiths and Green, 1999; head movement relative to the acoustic environment. Griffiths et al., 2000; Lewis et al., 2000; Bremmer et al., Changing acoustic angular velocity is, therefore, poten- 2001). Using different stimuli, activation has been shown tially an important cue signaling the position of self relain bilateral inferior parietal areas, ventral premotor areas, tive to the auditory environment. It is a complex sound and the frontal eye fields, in addition to right-lateralized movement property that might a priori have a different areas in the superior posterior parietal cortex. No previ- neuroanatomical substrate within or distinct from that ous study in which a comparison was made between a responsible for first-order sound-motion processing. In a third experiment using fmri (Experiment 3), we 4 Correspondence: t.d.griffiths@ncl.ac.uk sought to identify where brain activation is a neural cor-

278 Neuron 140 Table 1. Previous Functional Imaging Studies of Sound Movement Premotor Study Type Stimulus Task Key Contrast HG PT IPL Insula SPL Dorsal Ventral Griffiths et al., 1994 PET binaural beat none movement no no no R no no no minus stationary Griffiths et al., 1998 PET single object from none movement no no no no R no no narrow-band sound minus with interaural phase/ stationary intensity variation fmri single object from none movement no no R/L R R L R/L R/L narrow-band sound minus with interaural phase/ stationary intensity variation Griffiths and Green, 1999 PET rotation within a virtual none movement no no no no R no R/L sound field minus stationary Baumgart et al., 1999 fmri single object from none movement no R not imaged no not not not interaural intensity minus (limited imaged imaged imaged variation of FM sound stationary number of slices) Griffiths et al., 2000 fmri single object from none movement no no R/L no R L R/L R/L narrow-band sound minus with interaural phase/ stationary intensity variation Lewis et al., 2000 fmri single object from speed movement R/L R/L R/L R/L R/L R/L R/L narrow-band sound discrimination minus with interaural AM silence Bremmer et al., 2001 fmri binaural beat none movement R/L R/L R/L R/L no no R/L minus silence AM, amplitude modulation; FM, frequency modulation; fmri, functional magnetic resonance imaging; HG, Heschl s gyrus; IPL, inferior parietal lobule; PET, positron emission tomography; PT, planum temporale; SPL, superior parietal.

279 Human Sound-Motion Perception 141 relate of the perception of sound movement (Frith et al., served in the all-motion minus stationary contrast in 1999), rather than sensory processing of the stimulus. both experiments. This activation was located at the A priori there are a number of processes necessary parieto-occipital junction, anterior to the human visual for a sound to be perceived as moving in space. One motion area V5/MT (Watson et al., 1993; Tootell et al., important process is the distinction of movement of 1995). Activation in the fmri experiment for all-motion the sound object from the localization of a fixed sound minus stationary contrast occurred at Talairach coordiobject in space. Another is distinction of the spectro- nates 46, 58, 4 and 42, 64, 10 (Z 4.30 and 3.55, temporal change imposed on moving sounds in space respectively) compared to 45, 76, 3 for V5/MT (Tootell by the dynamic filtering mechanism of the two external et al., 1995). ears (Wightman and Kistler, 1989; Hofman et al., 1998) Second-order motion processing was assessed by from the intrinsic spectro-temporal structure of the the contrast between sound movement with changing sound. We examined the mechanisms for these pro- angular velocity and constant angular velocity (with the cesses by comparing the brain activity during movement same mean angular velocity in both cases). Neither the of a sound object in space with activity due to control PET nor the fmri group analyses demonstrated a signifistimuli that were either (1) externalized to one location cant difference in activation. in space but stationary or (2) similar in spectro-temporal structure, but not externalized. Experiment 3 Results In this fmri experiment, the same fixed-velocity rotating stimulus as in Experiments 1 and 2 was used with two Experiments 1 and 2 types of control stimulus. Stationary external control PET and fmri experiments were carried out in two cen- sounds were generated where the stimuli were either in ters using a similar paradigm. Subjects listened to virtual the midline as in Experiments 1 and 2 (midline stimuli, stimuli simulating a single acoustic object in the azi- azimuth 0 or 180 ) or located to the right or to the muthal plane. left (side stimuli, azimuth 90 or 270 ). A spectro- The stimulus was amplitude-modulated broadband temporal control sound was also generated by taking noise convolved with a generic head-related transfer the mean of the waveforms at each ear after convolution function (HRTF). The use of the HRTF generates a strong with the HRTF in the rotation condition and presenting percept of a virtual sound object located in external this stimulus diotically. This stimulus has a similar specspace (Wightman and Kistler, 1989). The sound object tro-temporal structure to the rotating stimulus but proeither remained stationary in front of the head or rotated duces a midline percept within the head without any around the head with fixed or changing angular velocity. externalization. The mean waveform, rather than the Subjects reliably distinguished the stationary from the waveform at either ear alone, was used to avoid monaumoving conditions and the moving conditions with fixed ral cues for movement perception (Zakarauskas and angular velocity from those with changing angular ve- Cynader, 1991). The spectro-temporal control stimulus locity. was perceived by all subjects as a sound with varying Contrasts between activation in the moving and staintensity over time that did not localize to a point in tionary conditions and between the first- and secondexternal space. It was easily distinguishable from the order motion conditions were performed using a threshmoving and fixed external sounds. old of p 0.05, corrected for multiple comparisons Activation in response to rotating sound was conacross the whole-brain volume. Comparison between trasted with the external midline, external side, and the moving and stationary (all-motion minus stationary) conditions showed no activation of primary auditory corin response to the spectro-temporal control stimulus spectro-temporal control stimuli. In addition, activation tex in medial HG in either the PET or fmri group studies (Table 2; Figures 1 and 2) or in any of the individual was contrasted with the external side stimulus. All consubject fmri data sets (Figure 3). For the all-motion trasts were thresholded at p 0.05, corrected for multi- minus stationary sound contrast, both PET and fmri ple comparisons across the whole-brain volume. The experiments showed bilateral activation in PT posterior contrast between rotating and external side stimuli to HG and bilateral activation in the parieto-temporal showed bilateral activation of PT, extending into PTO operculum (PTO), that part of the inferior IPL contiguous (Figure 4A). Although activation spread into HG on the with the posterior temporal plane. In both experiments, right, all local maxima were posterior to HG on both sides the significant contrast demonstrated in PT and PTO (Table 2). The contrast between rotating and external formed a contiguous cluster bilaterally (Figure 1). In the midline conditions (Figure 4B) demonstrated very similar fmri data, an additional local maximum was present in bilateral involvement of PT and PTO. The contrast be- left IPL (Figure 1). The fmri group analysis in Experiment tween rotating and spectro-temporal control conditions 2 showed activation of the medial part of PT (Figure 2); (Figures 4C and 4D) showed more localized activation however, individual analyses showed variation between of posterior medial PT and PTO (Figure 4D). The contrast subjects in the region activated within PT (Figure 3). The between spectro-temporal and external side stimuli PET data showed additional bilateral activation in the (Figure 4E) produced bilateral activation in PT and PTO premotor cortex (Table 2; Figure 1). that was more restricted and more laterally distributed Using a less stringent threshold of p without than the contrast between the rotating and external side correction for multiple comparisons (not shown in the conditions (Table 2; Figures 4A and 4E). No activation figure), bilateral posterior parietal activation was ob- was observed in premotor areas in this experiment.

280 Neuron 142 Table 2. Coordinates of Local Maxima and Z Scores for All Experiments Coordinates (mm) Region x y z Z score PET Experiment (1) Right planum temporale Left planum temporale Right premotor Left premotor fmri Experiment (2) Right planum temporale Left planum temporale Left inferior parietal lobule fmri Experiment (3) Rotation minus external side: Right planum temporale Left planum temporale Rotation minus external midline: Right planum temporale Left planum temporale Rotation minus spectro-temporal control: Right planum temporale Left planum temporale Spectro-temporal control minus external side: Right planum temporale Left planum temporale Coordinates are in mm according to Talairach and Tournoux (1988), based on spatial normalization to a template provided by the Montreal Neurological Institute. The Z score for Experiments 1 and 2 refers to the contrast between the all-motion and stationary sound conditions. A Z score of 4.5 corresponds to p 0.05 after correction for multiple comparisons. Discussion This study made novel use of a virtual acoustic space technique, the convolution of broadband noise with a generic HRTF, to produce a stable percept of an external sound source moving in azimuth. Using this stimulus and a very conservative criterion for significance that does not take the a priori anatomical hypotheses into Figure 1. All-Motion Minus Stationary Sound Contrast (Projections and Rendering) Statistical parametric maps of PET (Experiment 1) and fmri (Experiment 2) group data are shown as sagittal, coronal, and axial projections (above) and rendered onto a canonical template (below). All voxels significant at the p 0.05 level (corrected for multiple comparisons) are shown.

281 Human Sound-Motion Perception 143 Figure 3. Individual fmri Data Individual fmri data for three subjects (red, yellow, and blue) in Experiment 2 are superimposed on a canonical structural template. Tilted axial sections show the superior temporal plane. The data show absence of activation in Heschl s gyrus (HG) and bilateral activation in the planum temporale for each individual subject and intersubject variation within the planum temporale. Taken together, the two studies would be consistent with a computational role for PT in the disambiguation of spectro-temporal sound properties due to movement in space from the intrinsic spectro-temporal properties of the sound, to produce a neural correlate of the per- ception of movement. Such computation might be achieved by a form of independent or dynamic compo- nent analysis (Bell and Sejnowski, 1995; Attias and Schreiner, 1998). Although the current experiment does not allow localization of the neural correlate of sound- motion perception to either PT or PTO, the data show that it occurs at or before PTO. The present study does not support a hemispheric asymmetry of sound-movement processing with a preference for right PT, as suggested by Baumgart et al. (1999). However, previous observations in a patient with a lesion involving the posterior right hemisphere are consistent with the right pos- terior superior temporal cortex being necessary for the detection of cues for sound-movement perception (Grif- fiths et al., 1996). It will be of considerable future interest to assess sound-movement perception in patients with lesions involving left PT. The combined PT and PTO activation during move- ment perception suggests a posterior temporo-parietal pathway for processing sound movement in space, extending from PT through PTO into IPL. A potential anatomical substrate for the pathway has been demon- strated in cytoarchitectonic studies of human auditory cortical areas, which show that auditory parakoniocortex extends contiguously from the superior temporal plane into PTO (Galaburda and Sanides, 1980). It ap- pears likely that this pathway represents the human homolog of the posterior/dorsal processing stream for au- ditory spatial information in the monkey (Rauschecker, 1998). The existence of distinct pathways for spatial processing and recognition of sound is further sup- Figure 2. All-Motion Minus Stationary Sound Contrast (Sections) Statistical parametric maps in Experiments 1 and 2 have been rendered on coronal (top) and axial (below) sections of a canonical structural template. Axial sections have been tilted in the pitch plane to produce axial views parallel to the superior temporal plane at the two levels (A and B) indicated on the coronal view. Insets indicate the relationship of activations to Heschl s gyrus (HG). Both the PET data (red) and the fmri group data (yellow) show that all activations in the superior temporal plane occur posterior to Heschl s gyrus, in the planum temporale. Voxels activated jointly in both PET and fmri experiments are indicated in orange. All voxels significant at the p 0.05 level (corrected for multiple comparisons) are shown. account, we have shown bilateral activation of PT and PTO during sound-movement processing. This has been demonstrated in three separate functional imaging experiments conducted at two institutions. All previous fmri studies of auditory motion that have imaged IPL have demonstrated bilateral activation of this region (Table 1). The current experiments confirm that IPL is involved in the analysis of actual sound movement in space, with consistent involvement of the parieto-temporal junction. In addition to specific activation of PTO, the present study also shows that sound movement in external space activates PT. One previous fmri study showed activation of PT by interaural amplitude modulation (Baumgart et al., 1999), a cue that could be used in the analysis of auditory motion. The absence of PT activation in other studies of sound motion (Table 1) may reflect the stimuli employed. Both the study of Baumgart et al. (1999) and the present study used broadband stimuli with distinct spectro-temporal structures: the stimulus was sawtooth frequency modulation in the study of Baumgart et al. and amplitude-modulated noise here.

282 Neuron 144 Figure 4. Processing of Sound Motion Compared with Fixed-External and Spectro-Temporal Control Sounds Statistical parametric maps in Experiment 3 have been rendered on axial sections of a canonical structural template (A C and E) tilted in the superior temporal plane and on a whole brain canonical template (D). Contrasts are indicated below each panel. Activation within planum temporale in (E) occurs more laterally than in (A). All voxels significant at the p 0.05 level (corrected for multiple comparisons) are shown. ported by functional imaging and neuropsychological minus external side contrasts in Experiment 3 (Figures studies in normal and brain-damaged human subjects 4A and 4E) is consistent with a partial segregation of (Clarke et al., 2000; Maeder et al., 2001; Alain et al., auditory spatial and spectro-temporal processing in medial 2001). and lateral PT, respectively. By inspection, a similar The involvement of PT in spatial analysis clearly does locus within right PT was identified in the study of not represent the sole function of this large anatomical Baumgart et al. (1999); however, coordinates of local region considered as a whole. In humans, PT is activated maxima were not provided. Local maxima data for bilaterally during the processing of various types of acoustic spectro-temporal processing in the present sounds that show complex variation of structure over and previous studies are compared with the present time, including speech, music, and stimuli with speechlike data for sound-motion processing in Table 3. Taking the spectro-temporal structure (Zatorre et al., 1992; studies as a group, the locus of peak activation within PT Binder et al., 1996; Griffiths et al., 1998; Griffiths et al., produced by processing of intrinsic spectro-temporal 1999; Mummery et al., 1999; Binder et al., 2000; Giraud structure lies more laterally than the activation produced et al., 2000; Thivard et al., 2000). PT cannot, therefore, by sound motion. Although such a comparison is con- be regarded as a dedicated speech area. The present text sensitive and qualitative, it is likely that the disparity study suggests a specific role for PT in the analysis of would be even more marked were the convex geometry spatial sound properties, in addition to its previously of PT taken into account. demonstrated involvement in the analysis of sounds Functional specialization within human PT would be with complex temporal structure. It is plausible that sub- consistent with electrophysiological findings in nonhuman regions within PT have different functions, as suggested primates. In the macaque, a similarly located area by the activation of medial PT observed in the fmri in the posterior supratemporal plane (the caudal belt group data in Experiments 2 and 3 here. Comparison of region) has been implicated in the analysis of sound- the spectro-temporal minus external side and rotating source location (Leinonen et al., 1980; Recanzone,

283 Human Sound-Motion Perception 145 Table 3. Locations of Peak Activations within Planum Temporale (PT) in Studies of Acoustic Spectro-Temporal Processing Compared with Motion Processing in the Present Study Coordinates of PT Peak Activation (mm) Study Modality Key Contrast Side x y z Processing of Intrinsic Spectro-Temporal Structure: Binder et al., 1996 fmri tone sequences minus words L Griffiths et al., 1998b PET interaction between melody and degree R of temporal structure L Griffiths et al., 1999 PET pitch/duration sequences minus silence R L Binder et al., 2000 fmri tone sequences minus noise L Giraud et al., 2000 fmri Conjunction of different AM rates in R temporal envelope processing L Thivard et al., 2000 PET spectral motion versus stationary stimuli R L Present study fmri spectro-temporal control minus fixed R external sound L Processing of Sound Motion: Present study PET Sound rotation minus stationary R L fmri R L ). More specifically, the caudal belt region (areas tex. One prior imaging study has shown activation in a CM and CL) has been proposed to be the origin of a similar region using a stimulus that creates a sound processing stream for auditory spatial information field surrounding the head (Griffiths and Green, 1999). (Rauschecker, 1998; Rauschecker and Tian, 2000). The Potential pathways mediating the frontal activation seen present study supports the role of human PT in auditory in the PET experiment are suggested by anatomical spatial analysis, similar to the macaque s caudal belt. tracer studies in the macaque, demonstrating both a However, the electrophysiological data also indicate direct projection (Romanski et al., 1999) and an indirect that functional segregation within PT is not absolute: in projection via parietal cortex (Lewis and Van Essen, addition to being highly selective for the spatial position 2000) from the caudal belt region to prefrontal areas of a complex sound (like most other caudal belt neurons), implicated in spatial analysis. Taking these studies to- a certain subpopulation of neurons in CL also gether, this activation may be interpreted as reflecting shows specificity for communication sounds (Tian et al., the coding of auditory space in a coordinate system 2001). suitable for movement preparation, as may occur in primate It remains uncertain whether an auditory analog of area PMv (Graziano et al., 1999). visual area V5/MT exists, specialized for auditory motion Finally, we were interested to determine whether the processing. Using an uncorrected threshold (p 0.001), first-order sound-movement property of fixed angular we observed a consistent posterior parietal activation velocity and the second-order property of changing angular anterior to human V5/MT in the all-motion minus stationary velocity might have distinct neuroanatomical subanterior sound contrast. However, this activation could not strates, in view of the fact that variable angular velocity be demonstrated using the more stringent corrected relative to a source will be produced by head movements threshold, and its biological significance remains un- during the exploration of auditory space. The current clear. By analogy with the MT/MST complex in the visual data do not support this hypothesis. Rather, they show system, it is conceivable that additional cortical areas that both first- and second-order sound-movement beyond the temporo-parietal pathway are involved in properties are processed in the posterior temporo-pari- the perceptual processing of auditory motion. etal pathway. The current study was designed to address Activation of frontal and superior parietal areas was only the simplest scenario of a distinct neuroana- inconsistently observed in the present experiments and tomical substrate; the finding of a shared anatomical in previous studies (Table 1). This variability may reflect framework raises the interesting possibility that the re- a spatial attentional or movement preparation function spective neural correlates of these properties may lie in for these areas that differs between techniques and paradigms. specific patterns of activation or connectivity between Similar activation in studies of visuospatial at- PT, PTO, and IPL. tention suggests that this activation may not be modality In summary, we have demonstrated a common, bilat- specific (Nobre et al., 1997; Coull and Nobre, 1998; Alain eral brain network including PT and PTO for the analysis et al., 2001; see, however, Bushara et al., 1999, for modality-specific of the sound-movement properties needed to encode activation of parietal and frontal areas by the perception of movement in auditory space. Activa- virtual auditory space stimuli). The PET data in the current tion of this network is uniform across studies and im- study show bilateral activation of the premotor cor- aging modalities and cannot be attributed simply to

284 Neuron 146 ated by taking the mean of the waveforms at each ear after convolv- ing with the HRTF in the rotation condition and presenting this stimulus diotically (a sound with varying intensity over time, not localizing to a point in external space). Before scanning, subjects were questioned about the stimuli to ensure that the different percepts were reliably experienced by all subjects. During scanning, subjects were required to fixate a cross- piece at the midpoint of the visual axes and listen for any change in the sound stimulus. spectro-temporal processing of complex sound sources nor to externalization of those sources in space. The findings are consistent with a posteriorly directed pro- cessing stream comprising PT, PTO, and IPL responsible for the obligatory perceptual processing of the movement of sound objects in space. We propose that the initial disambiguation of binaural and monaural spectro-temporal cues in medial PT enables the subsequent formation of a spatial percept at the level of the parieto-temporal junction. Experimental Procedures Subjects Eight male subjects (seven right-handed, one left-handed) aged 23 to 42 were included in Experiment 1, nine right-handed subjects (five males, four females) aged 19 to 33 in Experiment 2, and twelve subjects (eight males, four females; eleven right-handed, one left- handed) aged 22 to 38 in Experiment 3. No subject had any history of hearing or neurological disorder, and all had normal structural MRI scans. The experiments were carried out with the approval of local ethics committees in London and Washington, D.C., and the PET studies carried out under certification from the Administration of Radioactive Substances Advisory Subcommittee (Department of Health, London, UK). Stimuli and Task Stimuli were created digitally at a sample rate of 44.1 khz. The single-sound object was a fixed-amplitude spectrum, randomphase noise (passband 1 Hz 20 khz) created in MATLAB 5.3. The noise was sinusoidally amplitude modulated at 80 Hz (modulation depth 80%) to produce an additional cue for spatial location and convolved with a generic HRTF (Wightman and Kistler, 1989) to create a virtual external acoustic stimulus. The generic HRTF used in these experiments does not allow the same spatial acuity as individual HRTFs (Hofman et al., 1998) but, nevertheless, reliably produced the required percept. The use of fixed HRTFs corresponding to one spatial location allowed the simulation of sounds at one point in space, and the use of dynamically updated HRTFs allowed simulation of movement of sounds in azimuth. During PET scanning (Experiment 1), digital recordings of the stimuli were delivered using pneumatic Etymotic insert earphones at a sensation level of 50 db. During fmri scanning, digital sound recordings were delivered using a custom pneumatic system for Experiment 2 and a custom elec- trostatic system for Experiment 3 ( soundsystem/index.shtml) at a sensation level of 50 db. Presenta- tion of the moving stimuli by all three delivery systems produced a percept of sound movement around the head at a distance of approximately 0.5 m. In Experiments 1 and 2, four sound conditions corresponding to four different percepts were used: zero mean angular velocity, no change in angular velocity (object stationary in front of head); zero mean angular velocity, changing angular velocity (object moving from side-to-side in front of head); fixed mean positive angular ve- locity, no change in angular velocity (object rotating clockwise around head with constant speed); and fixed mean positive angular velocity, changing angular velocity (object rotating clockwise around head with variable speed). Positive mean angular velocity was fixed at 320 /s, and a change in angular velocity was produced by addition of a sinusoidal displacement of peak amplitude 50 and rate 1 Hz. In Experiment 3, four sound conditions corresponding to six different percepts were employed. Midline stimuli were ampli- tude-modulated broadband noise delivered binaurally in the midline, either at azimuth 0 (object stationary in front of head) or 180 (object stationary behind head). Side stimuli were amplitude-modu- lated broadband noise delivered binaurally to the side of the head, either at azimuth 90 (object stationary opposite right ear) or 270 (object stationary opposite left ear). A moving stimulus was created by convolution of amplitude-modulated noise with dynamically updated HRTFs to simulate a fixed mean positive angular velocity of 320 /s, as in Experiment 1 (object rotating clockwise around head with constant speed). A spectro-temporal control stimulus gener- PET Paradigm and Analysis Experiment 1 was conducted at the Wellcome Department of Imaging Neuroscience, London. Regional cerebral blood flow was measured during 12 scans for each subject using the oxygen 15 labeled water bolus technique and a Siemens/CPS ECAT Exact HR (962) scanner in 3D mode. Four scans were carried out for each condition. Group analysis for the eight subjects was carried out using statistical parametric mapping implemented in SPM99 software ( Scans were realigned and spatially normalized (Friston et al., 1995) to the standard stereotactic space of Talairach (Talairach and Tournoux, 1988). Data were smoothed with an isotropic Gaussian kernel of 16 mm full width at half maximum (FWHM). Analysis of covariance was used to correct for differences in global blood flow between scans. Differences in blood flow between conditions were assessed with the t statistic at each voxel using a significance threshold of p 0.05 after correction for multiple comparisons using Gaussian random field theory. The effect of sound motion was demonstrated by the contrast between the all-motion and stationary conditions (Table 2; Figures 1 and 2). The effect of second-order motion was assessed by the contrast between the changing and the fixed angular velocity condi- tions. This represents a pure contrast to determine the effect of rotation with changing velocity (second-order motion) compared to an appropriate control condition that still contains first-order motion with the same mean angular velocity. All subjects also underwent structural MRI. fmri Paradigm and Analysis Experiment 2 was conducted at Georgetown University (Washing- ton, D.C.) and Experiment 3 at the Wellcome Department of Imaging Neuroscience (London). In Experiment 2, blood oxygen level depen- dent (BOLD) contrast image volumes were acquired at 1.5 T (Siemens Vision, Erlangen) with gradient echo planar imaging (TR/TE 12,000/40 ms). Each volume comprised 35 contiguous 4 mm slices with an in-plane resolution of mm. 192 scans were acquired for each subject (48 volumes per condition) in two sessions using a sparse imaging paradigm (Hall et al., 1999) to maximize the difference between the BOLD response to the signal of interest and the response to scanner noise. In Experiment 3, BOLD contrast images were acquired at 2 T (Siemens Vision, Erlangen) with gradient echo planar imaging (TR/TE 12,000/40 ms). Each volume comprised 48 contiguous 4 mm slices with an in-plane resolution of 3 3mm. 128 scans were acquired for each subject (32 volumes per condition) in two sessions using a sparse paradigm. For both experiments, preprocessing and analysis were carried out using SPM99. Spatial smoothing was carried out using a filter with FWHM of 8 mm. Data were analyzed in each experiment by modeling the evoked hemody- namic response for the different stimuli as boxcar functions in the context of the general linear model. In Experiment 2, a fixed-effects model was used to analyze the same group contrasts as in the PET experiment (between the all-motion and stationary conditions and between the second-order and first-order motion conditions) at the same corrected significance level of p 0.05 (Table 2; Figures 1 and 2). Analyses were also carried out for the individual subjects (Figure 3). In Experiment 3, a fixed-effects model was also used to analyze the contrasts between rotating, external fixed midline and side and spectro-temporal control conditions and analyzed at the same corrected significance level of p 0.05 (Table 2; Figure 4). Acknowledgments J.D.W. and T.D.G. are supported by the Wellcome Trust. B.A.Z. and J.P.R. are supported by NIH grants F31-MH and R01-DC-

285 Human Sound-Motion Perception , respectively. We thank three anonymous reviewers for their Griffiths, T.D., Rees, A., Witton, C., Shakir, R.A., Henning, G.B., and valuable suggestions. Green, G.G.R. (1996). Evidence for a sound movement area in the human cerebral cortex. Nature 383, Received: August 22, 2001 Griffiths, T.D., Rees, G., Rees, A., Green, G.G.R., Witton, C., Rowe, Revised: January 24, 2002 D., Büchel, C., Turner, R., and Frackowiak, R.S.J. (1998a). Right parietal cortex is involved in the perception of sound movement in References humans. Nat. Neurosci. 1, Griffiths, T.D., Büchel, C., Frackowiak, R.S.J., and Patterson, R.D. Ahissar, M., Ahissar, E., Bergman, H., and Vaadia, E. (1992). Encod- (1998b). Analysis of temporal structure in sound by the human brain. ing of sound source location and movement: activity of single neu- Nat. Neurosci. 1, rons and interactions between adjacent neurons in the monkey audi- Griffiths, T.D., Johnsrude, I., Dean, J.L., and Green, G.G.R. (1999). tory cortex. J. Neurophysiol. 67, A common neural substrate for the analysis of pitch and duration Alain, C., Arnott, S.R., Hevenor, S., Graham, S., and Grady, C.L. pattern in segmented sound? Neuroreport 18, (2001). What and where in the human auditory system. Proc. Natl. Griffiths, T.D., Green, G.G.R., Rees, A., and Rees, G. (2000). Human Acad. Sci. USA 98, brain areas involved in the perception of auditory movement. Hum. Attias, H., and Schreiner, C.E. (1998). Blind source separation and Brain Mapp. 9, deconvolution: the dynamic component analysis algorithm. Neural Hall, D.A., Haggard, M.P., Akeroyd, M.A., Palmer, A.R., Summerfield, Comput. 10, A.Q., Elliott, M.R., Gurney, E.M., and Bowtell, R.W. (1999). Sparse Baumgart, F., Gaschler-Markefski, B., Woldorff, M.G., Heinze, H.-J., temporal sampling in auditory fmri. Hum. Brain Mapp. 7, and Scheich, H. (1999). A movement-sensitive area in auditory cor- Hofman, P.M., Riswick, J.G.A., and van Opstal, A.J. (1998). Relearntex. Nature 400, ing sound localisation with new ears. Nat. Neurosci. 1, Bell, A.J., and Sejnowski, T.J. (1995). An information-maximization Leinonen, L., Hyvärinen, J., and Sovijärvi, A.R.A. (1980). Functional approach to blind separation and blind deconvolution. Neural Com- properties of neurons in the temporo-parietal association cortex of put. 7, awake monkey. Exp. Brain Res. 39, Binder, J.R., Frost, J.A., Hammeke, T.A., Rao, S.M., and Cox, R.W. Lewis, J.W., and Van Essen, D.C. (2000). Corticocortical connections (1996). Function of the left planum temporale in auditory and linguis- of visual, sensorimotor, and multimodal processing areas in the tic processing. Brain 119, parietal lobe of the macaque monkey. J. Comp. Neurol. 428, Binder, J.R., Frost, J.A., Hammeke, T.A., Bellgowan, P.S.F., Springer, J.A., Kaufman, J.N., and Possing, E.T. (2000). Human temporal lobe Lewis, J.W., Beauchamp, M.S., and DeYoe, E.A. (2000). A compariactivation by speech and nonspeech sounds. Cereb. Cortex 10, son of visual and auditory motion processing in human cerebral cortex. Cereb. Cortex 10, Bremmer, F., Schlack, A., Shah, N.J., Zafiris, O., Kubischik, M., Hoff- Maeder, P.P., Meuli, R.A., Adriani, M., Bellmann, A., Fornari, E., mann, K.-P., Zilles, K., and Fink, G.R. (2001). Polymodal motion Thiran, J.P., Pittet, A., and Clarke, S. (2001). Distinct pathways inprocessing in posterior parietal and premotor cortex: a human fmri volved in sound recognition and localization: a human fmri study. study strongly implies equivalencies between humans and mon- Neuroimage 14, keys. Neuron 29, Mummery, C.J., Ashburner, J., Scott, S.K., and Wise, R.J.S. (1999). Bushara, K.O., Weeks, R.A., Ishii, K., Catalan, M.J., Tian, B., Functional neuroimaging of speech perception in six normal and Rauschecker, J.P., and Hallett, M. (1999). Modality-specific frontal two aphasic subjects. J. Acoust. Soc. Am. 106, and parietal areas for auditory and visual spatial localization in hu- Nobre, A.C., Sebetyen, G.N., Gitelman, D.R., Mesulam, M.M., Frackmans. Nat. Neurosci. 2, owiak, R.S.J., and Frith, C.D. (1997). Functional localisation of the Clarke, S., Bellmann, A., Meuli, R.A., Assal, G., and Steck, A.J. (2000). system for visiospatial attention using positron emission tomogra- Auditory agnosia and auditory spatial deficits following left hemi- phy. Brain 120, sphere lesions: evidence for distinct processing pathways. Neurop- Penhune, V.B., Zatorre, R.J., MacDonald, J.D., and Evans, A.C. sychologia 38, (1996). Interhemispheric anatomical differences in human primary Coull, J.T., and Nobre, A.C. (1998). Where and when to pay attention: auditory cortex: probabilistic mapping and volume measurement the neural systems for directing attention to spatial locations and from magnetic resonance scans. Cereb. Cortex 6, to time intervals as revealed by both PET and fmri. J. Neurosci. 18, Rademacher, J., Morosan, P., Schormann, T., Schleicher, A., Wer ner, C., Freund, H.J., and Zilles, K. (2001). Probabilistic mapping Friston, K.J., Ashburner, J., Frith, C.D., Poline, J.-B., Heather, J.D., and volume measurement of human primary auditory cortex. Neuroand Frackowiak, R.S.J. (1995). Spatial registration and normalisation image 13, of images. Hum. Brain Mapp. 2, Rauschecker, J.P. (1998). Cortical processing of complex sounds. Frith, C., Perry, R., and Lumer, E. (1999). The neural correlates of Curr. Opin. Neurobiol. 8, conscious experience: an experimental framework. Trends Cogn. Rauschecker, J.P., and Tian, B. (2000). Mechanisms and streams Sci. 3, Galaburda, A., and Sanides, F. (1980). Cytoarchitectonic organization of the human auditory cortex. J. Comp. Neurol. 190, for processing of what and where in auditory cortex. Proc. Natl. Acad. Sci. USA 97, Recanzone, G.H. (2000). Spatial processing in the auditory cortex of the macaque monkey. Proc. Natl. Acad. Sci. USA 97, Giraud, A.L., Lorenzi, C., Ashburner, J., Wable, J., Johnsrude, I., Frackowiak, R.S.J., and Kleinschmidt, A. (2000). Representation of Romanski, L.M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P.S., the temporal envelope of sounds in the human brain. J. Neurophys- and Rauschecker, J.P. (1999). Dual streams of auditory afferents iol. 84, target multiple domains in the primate prefrontal cortex. Nat. Neurosci. Graziano, M.S.A., Reiss, L.A., and Gross, C.G. (1999). A neuronal 2, representation of the location of nearby sounds. Nature 397, Talairach, P., and Tournoux, J. (1988). A Stereotactic Coplanar Atlas of the Human Brain (Stuttgart: Thieme). Griffiths, T.D., and Green, G.G.R. (1999). Cortical activation during Thivard, L., Belin, P., Zilbovicius, M., Poline, J.-B., and Samson, perception of a rotating wide-field acoustic stimulus. Neuroimage Y. (2000). A cortical region sensitive to auditory spectral motion. 10, Neuroreport 11, Griffiths, T.D., Bench, C.J., and Frackowiak, R.S.J. (1994). Cortical Tian, B., Reser, D., Durham, A., Kustov, A., and Rauschecker, J.P. areas in man selectively activated by apparent sound movement. (2001). Functional specialization in rhesus monkey auditory cortex. Curr. Biol. 4, Science 292,

286 Neuron 148 Tootell, R.B., Reppas, J.B., Kwong, K.K., Malach, R., Born, R.T., Brady, T.J., Rosen, B.R., and Belliveau, J.W. (1995). Functional analysis of human cortical area MT and related visual cortical areas using magnetic resonance imaging. J. Neurosci. 15, Toronchuk, J.M., Stumpf, E., and Cynader, M.S. (1992). Auditory cortex neurons sensitive to correlates of auditory motion: underlying mechanisms. Exp. Brain Res. 88, Watson, J.D.G., Myers, R., Frackowiak, R.S.J., Hajnal, J.V., Woods, R.P., Mazziotta, J.C., Shipp, S., and Zeki, S. (1993). Area V5 of the human brain: evidence from a combined study using positron emission tomography and magnetic resonance imaging. Cereb. Cortex 3, Wightman, F.L., and Kistler, D.J. (1989). Headphone simulation of free-field listening. I: stimulus synthesis. J. Acoust. Soc. Am. 85, Zakarauskas, P., and Cynader, M.S. (1991). Aural intensity for a moving source. Hear. Res. 52, Zatorre, R.J., Evans, A.C., Meyer, E., and Gjedde, A. (1992). Lateralization of phonetic and pitch discrimination in speech processing. Science 256,

287 348 Opinion TRENDS in Neurosciences Vol.25 No.7 July 2002 The planum temporale as a computational hub Timothy D. Griffiths and Jason D. Warren It is increasingly recognized that the human planum temporale is not a dedicated language processor, but is in fact engaged in the analysis of many types of complex sound. We propose a model of the human planum temporale as a computational engine for the segregation and matching of spectrotemporal patterns. The model is based on segregating the components of the acoustic world and matching these components with learned spectrotemporal representations. Spectrotemporal information derived from such a computational hub would be gated to higher-order cortical areas for further processing, leading to object recognition and the perception of auditory space. We review the evidence for the model and specific predictions that follow from it. Timothy D. Griffiths* Jason D. Warren Newcastle University Medical School, Framlington Place, Newcastle-upon-Tyne, UK NE2 4HH. Wellcome Dept of Imaging Neuroscience, Institute of Neurology, Queen Square, London, UK WC1N 3BG. * t.d.griffiths@ ncl.ac.uk What does the human planum temporale (PT) do? This large region, which occupies the superior temporal plane posterior to Heschl s gyrus, is generally agreed to represent auditory association cortex. However, disagreement exists regarding its anatomy [1] and structure function relationships [2 4]. In the left hemisphere, most definitions of Wernicke s area include part of PT [5] and, indeed, the human PT has traditionally been viewed as a language processor [2]. However, functional imaging indicates that the PT processes diverse types of sound (Fig. 1, Table 1). This article develops a functional model to explain this. The PT is concerned with analysis of sounds that are spectrally and temporally complex, comprising several component frequencies that change over time (Fig. 2). Such sounds are common in nature. The brain is continuously required to analyse these incoming spectrotemporal patterns and to compare them with those previously experienced, during the process known as auditory scene analysis [6]. Such analysis allows the identification and assignment of position to a mixture of acoustic sources (sound objects) heard simultaneously. This demands both segregation of the spectrotemporal pattern associated with each sound object and separation of each object from the spectrotemporal effects of its location. We argue that the PT solves this daunting computational problem. Although mechanisms for the accurate representation of incoming acoustic spectrotemporal signals exist in the ascending auditory pathways and primary auditory cortex (PAC) [7 9], it would be surprising if a priori this system were sufficient for auditory scene analysis. Even the discrimination of a single sound object from the effect of spatial (a) (b) STG, STS, MTG Speech L Environmental sounds PTO / IPL + Simple sound patterns Voices PTO / IPL STG TRENDS in Neurosciences position (Fig. 3) requires learned information about how the external ears filter sound signals arising in different locations, in addition to accurate representations of the sound waveform at the eardrums. This demanding computation might be achieved serially in the PT after initial processing in PAC, using the modular architecture in the PT [10] and inputs from other cortical areas [11 14]. Such computation would transform incoming auditory patterns into information about acoustic objects and position that could be used in other cortical areas. In this model, the PT thus represents a computational hub that directs further processing in other cortical regions, consistent with studies of the R Auditory space Music Cross-modal processing Fig. 1. The planum temporale (PT) as an anatomical and functional hub. (a) Tilted axial section through the superior temporal plane of the human brain. The PT lies posterior to Heschl s gyrus (HG), the site of the primary auditory cortex, and is contiguous posteriorly with the parieto temporal operculum (PTO). Ninety-five percent probability maps for the boundaries of left and right PT in humans (derived from Ref. [1]) are outlined in red. (b) Insets centred on left and right PT, showing functional activation peaks within PT associated with different types of complex sound processing (see Table 1). Symbols are explained underneath. The functional relationships between the PT and higher cortical areas that are coactivated in processing simple sound patterns (green), music (yellow), speech (red) and auditory space (blue) are indicated schematically. Arrows indicate postulated flow of information from the PT to these higher areas; in many cases, however, exchange of information is likely to be reciprocal. We propose a generic computational mechanism within the PT for the analysis of spectrotemporal pattern. Computation uses information about sound objects derived from higher cortical areas linked to the PT, and the output of the PT is used to update stored information in these same areas. Abbreviations: IPL, inferior parietal lobe; MTG, middle temporal gyrus; PTO, parieto temporal operculum; STG, lateral superior temporal gyrus; STS, superior temporal sulcus /02/$ see front matter 2002 Elsevier Science Ltd. All rights reserved. PII: S (02)

288 Opinion TRENDS in Neurosciences Vol.25 No.7 July Table 1. Functional imaging data on planum temporale (PT) involvement in different aspects of spectrotemporal pattern processing a Principal contrast Side PT peak activation (mm) Regions activated concurrently Refs x y z Spatial analysis Sound source rotation minus stationary sound object (fmri) L Bilat. PTO, L IPL [16] R Sound source rotation minus stationary sound object (PET) L Bilat. PTO, Bilat. Premotor Area [16] R Simple sound patterns Duration sequences minus silence L Cb, Bilat. HG, Bilat. STG, Bilat. IPL, Bilat. Frontal [38] R Lobe Harmonic complex minus pure tones L R HG, Bilat. STG [34] R Frequency-modulated minus unmodulated tones L Bilat. HG, Bilat. STG [34] R Amplitude-modulated minus unmodulated noise L Bilat. HG, L STS, L STG, L IPL [35] R Spectral motion versus stationary stimuli L Bilat. STG [61] R Spectrotemporal minus fixed external sound L Bilat. STG [16] R Pitch sequences Pitch sequences minus silence L Cb, Bilat. HG, Bilat. STG, Bilat. IPL, Bilat. Frontal [38] R Lobe Tone sequences minus words (active task) L [44] Tone sequences minus noise L Bilat. STG, R STS [36] R Environmental sounds Passive listening minus rest L Bilat. HG, R Inf. Frontal Lobe, R Insula, R IPL [39] Voices Vocal minus non-vocal sounds L Bilat. STS, R MTG [40] R Music Deviant minus standard chords (pre-attentive) R R STG [62] Melodies minus noise R R STG, R Fusiform Gyrus [41] Listening to familiar songs minus visual baseline L Bilat. HG, Bilat. STG, Bilat. Frontal Lobe, L IPL, [63] R R SMA Maintenance of pitch while singing minus complex pitch R R HG, Bilat. Frontal Lobe, Bilat. Insula, Bilat. IPL, [64] perception Bilat. Occipital Lobe, Cb Musical imagery (imagining continuation of a tune minus R Bilat. Frontal Lobe, L SMA [42] listening) Speech and speech-like sounds Speech minus noise L Bilat. STG, L MTG, L Inf. Frontal Lobe [65] Speech minus complex non-speech L Bilat. MTG, Bilat. STG, R Inf. Frontal Lobe [46] Speech minus tones L Bilat. MTG, Bilat. STG, R Insula [46] Complex non-speech minus tones L Bilat. STG, R MTG [46] Consonant vowel syllables minus vowels L R STG, R STS [66] R Unvoiced minus voiced consonants L L HG [66] Verbal self-monitoring (reading aloud with distorted L L STS, R STG, L Insula [50] feedback minus reading aloud) Dichotic listening Listening to dichotic minus diotic speech L Bilat. STG, Bilat. STS, Bilat. Inf. Frontal Lobe, R Insula [52] Active listening Active target detection minus passive listening L L STS, L IPL, L Frontal Lobe, L Thalamus, L Insula [51] Cross-modal processing Coherent visual motion minus stationary stimulus L Bilat. V5, Bilat. V3 [54] Optical flow minus randomized optical motion R Lip-reading minus watching meaningless facial movements L L PTL, R IPL [56] R Auditory plasticity Sign language minus visual fixation in deaf subjects L R STG [57] Post-training minus pre-training deactivation L Bilat. STG, Bilat. STS, R HG [58] a All local PT maxima fall within the 95% probability anatomical boundaries for human PT proposed by Westbury et al. [1]. Studies have been selected to illustrate the variety of types of pattern processing in PT and the different cortical areas coactivated in each case. Abbreviations: Bilat., bilateral; Cb, cerebellum; HG, Heschl's gyrus; Inf., inferior; IPL, inferior parietal lobe; L, left; MTG, middle temporal gyrus; PTL, posterior temporal lobe; PTO, parieto temporal operculum; R, right; SMA, supplementary motor area; STG, superior temporal gyrus; STS, superior temporal sulcus.

289 350 Opinion TRENDS in Neurosciences Vol.25 No.7 July 2002 (a) Frequency (khz) Frequency (khz) (b) (c) Frequency (khz) Time (s) Time (s) Time (s) TRENDS in Neurosciences Fig. 2. Sounds analysed in the planum temporale (PT). The PT is involved in the analysis of sounds with complex spectrotemporal structure where there are multiple frequency components that change over time. This is shown in these spectrograms that show the frequency spectrum of the sound at one eardrum as a function of time. (a) 1 s sample of speech. (b) 10 s sample of classical orchestral music. (c) 1 s sample of amplitude modulated noise moving quickly around the head, similar to the sound of an insect. cortical processing of language [15], auditory space [16] and other types of pattern within complex sound (Table 1, Fig. 1). Such a hub could access distinct cortical mechanisms for sound-object identification and localization [17 19] (Fig. 1). The model: computational analysis of sound patterns The segregation and matching of spectrotemporal patterns could be achieved in the PT using similar computational mechanisms and neuronal architecture. In this scheme, the PT is a crucial computational interface between incoming sound patterns that are segregated in the PT, and the previously stored patterns with which these are matched. The output after such computation provides information about the acoustic environment that is not immediately available either in the acoustic input or as a result of auditory processing before the PT. Spectrotemporal analysis in the PT can be considered over three different timespans. The first corresponds to the segregation of simultaneous spectrotemporal patterns, for example, when multiple sound objects are presented. The second corresponds to the segregation of spectrotemporal patterns between successive time points during analysis of the motion of sound objects in space, or analysis of a succession of sounds in time. This timescale corresponds to that of transient acoustic memory suggested by human electrophysiology [20]. Finally, the PT might operate over longer timespans, to effect matching of the incoming spectrotemporal patterns with stored patterns or templates. The timescale of PT computation need not bear a simple relationship to the temporal structure of the acoustic waveform represented in PAC. Depth-electrode studies in humans suggest that rapid acoustic temporal changes, such as voice-onset time, are less clearly represented in the PT than in PAC [21,22], consistent with the processing of stored representations over hundreds of milliseconds rather than the faithful temporal representation of the incoming stimulus. The analysis that we suggest might be achieved by several different algorithms instantiated in a variety of neural networks. Independent and dynamic component analyses (ICA and DCA, respectively) belong to this family of algorithms [23,24]. Essentially, these achieve separation of the components of a mixture by assuming that these are not correlated [25]. The segregation mechanisms that we envisage could have some of the formal features of ICA, because the auditory system operates under similar constraints to those under which ICA was developed. However, there are no grounds for specifying one member of this family of algorithms, or for suggesting equivalence between a specific artificial neural network used for ICA and the actual neural networks in the PT. Several neural network configurations might achieve the same computational result. We do, however, predict that the required computation is likely to be performed by a neuronal population within the PT, rather than at the level of single neurons. In terms of information theory, the problem confronting the auditory system is the transfer of maximum information from multiple auditory sources to their neuronal representations. Information about sources forms a convolutive mixture in the acoustic waveform at the eardrum. Here, each sound source is convolved with a filter function that corresponds to the effect of the external ear on sound from a particular region of space [26]; sound sources are further processed by a series of filters in the ascending pathway to PAC. In terms of ICA, the multiple-source problem can be solved when the characteristics of the mixing filters are known (essentially, a form of ICA that is not completely blind ). The PT could operate on the information from

290 Opinion TRENDS in Neurosciences Vol.25 No.7 July Frequency (Hz) Frequency (Hz) Time (s) R Time (s) Auditory pathway to PAC Representation of spectrotemporal pattern at the two ears Planum temporale Computation of most likely combination of sound object and head-related transfer functions Output 1: sound object Semantic processing Frequency (Hz) Time (s) L Output 2: sound position Spatial processing Input 1: spectrotemporal pattern Input 2: effect of head-related transfer function Input 3: learned effect of head-related transfer functions for different points in space TRENDS in Neurosciences Fig. 3. The computational hub in action: auditory spatial analysis. The spectrotemporal pattern at the two ears results from convolution of the acoustic signal in space (in this example, a square-wave amplitude-modulated noise, similar to the sound of a helicopter) with the head-related transfer functions (HRTF) at the two ears (in this example, corresponding to a location above the subject, to the right). Initial processing of spectrotemporal patterns, including comparison between the ears, occurs in the ascending pathway to the primary auditory cortex (PAC). In the cortex, it is likely that multiple neurons are required to encode a given position in space [30,67]. Our model proposes serial input from PAC to planum temporale (PT), followed by further processing in the PT to compute the most likely combination of sound objects and positions producing the binaural spectrotemporal pattern in PAC. In performing the computation, the PT accesses learned information about the acoustic world (in this example, the HRTF) stored locally or in higher cortical areas. Output from the PT comprises spatial information that passes to the parieto temporal operculum and inferior parietal lobule, and sound-object information that passes to the temporal convexity for semantic processing. PAC in such a way by accessing information about the characteristics of these auditory filters acquired as a result of experience [27]. In addition to the mixing filters, we propose that the PT also has access to information about previously experienced sound objects. The filter and object properties might be stored locally in the PT or in other cortical areas. These properties are also key components of other biologically plausible models. For example, in predictive coding [28] they would be described as feedback predictions that allow computation of the source properties of natural scenes. Output of the PT would consist of segregated spectrotemporal patterns corresponding to auditory objects and their spatial characteristics. This output would feed forward to areas that store information regarding the mixing filter and sound-object characteristics, where it would be used to update this stored information. Accordingly, because reciprocal top-down and bottom-up processing are an essential feature of the segregation process that we envisage as occurring in PT, ours could be classified as a generative model [29]. A core feature common to generative models is plasticity the capacity for modification of the computational algorithm based on experience. Evidence for the model Spatial perception Auditory spatial analysis is the prototypical application of our computational model (Fig. 3). The PT or its homologues have been implicated in acoustic spatial analysis in both electrophysiological studies in monkeys [19,30,31] and functional imaging in humans [16,17,32,33]. Recent functional imaging experiments using broadband stimuli have demonstrated PT activation when the computation of sound movement requires segregation of the effect of movement from the intrinsic structure of the sound [16,32]. In this situation, a movement trajectory could be computed in the PT by continuous segregation of a spectrotemporal pattern that corresponds to the original sound object from movement-transformed versions of itself. Elementary acoustic pattern perception Evidence also exists for specific PT activation in the processing of spectrotemporal patterns that are not spatially determined, including harmonic complexes [34], amplitude modulation [35], frequency modulation [34,36] and sound sequences [37,38]. By contrast, differential activation does not occur in PAC, consistent with the primary cortex acting as a conduit for further processing. Environmental sound perception The PT is engaged in processing a range of naturally occurring environmental sounds, including animal cries and inanimate noises [17,39]. Voices are specific examples of ethologically salient environmental sounds for which stored templates might exist. When contrasted with non-vocal sounds of similar frequency distribution, activation of the left PT by voices could reflect processing based on such templates [40]. Musical perception and imagery The perception of music demands the capacity to build and retain long-lasting abstractions of spectrotemporal structures. Right-sided PT activation typically accompanies the perception of melodies (Table 1), and active tasks that involve musical pitch recruit a frontal temporal network, which includes the PT (in normal subjects) [41]. A frontal temporal network involving the PT is also activated during musical imagery [42] and during

291 352 Opinion TRENDS in Neurosciences Vol.25 No.7 July 2002 Acknowledgements Our work is supported by the Wellcome Trust. We thank G. Green, K. Friston and A. Rees for helpful discussion. musical hallucinations [43], when subjects perceive music in the absence of any musical stimulus. In the case of music, therefore, the proposed initial spectrotemporal analysis in the PT affords access to widely distributed and lateralized brain processing mechanisms that operate over different temporal scales. Speech perception The representation and updating of auditory speech traces are necessary for phonological working memory and speech production [5]. The PT has been implicated in these processes in studies of both normal and brain-damaged subjects [44,45]. It is also activated by natural speech contrasted with acoustically similar non-speech sounds [46], by deviant or unpredictable verbal and nonverbal events [47 49], and in verbal self-monitoring [50]. These observations are consistent with the suggestion [5] that Wernicke s area constructs a transient representation of the spectrotemporal structures embodied in spoken words, regardless of whether these are heard or retrieved from lexical memory (i.e. a phonological template ). Such a role would be crucial for distinguishing phonemes as closely related spectrotemporal structures [36]. However, we do not argue that PT is necessarily the primary storage site for such templates. Other considerations Attention Attentional ( top-down ) influences might modulate PT computation via its connections to other association areas (Table 1). Activation of the PT in studies that specifically assess the effect of attention [51] and in dichotic listening [52] might be interpreted in this way. However, functional imaging studies of auditory spatial and object processing have demonstrated that PT activation does not depend on whether a task is employed [17]. Moreover, one generator for the pre-attentive mismatch negativity response in electroencephalography and magnetoencephalography [20] (an electrophysiological correlate of oddball or novel stimuli) arises in the vicinity of the PT. We therefore suggest that, although PT contributes to an auditory attentional network, its computational role does not depend on attention. Cross-modal processing Area Tpt of the macaque, a potential homologue of the human PT, contains neurons that are responsive to visual and somaesthetic, as well as auditory, stimuli [53]. Cross-modal processing of visual motion has also been demonstrated in the human PT [54]. Activation of the PT during reading [55] and lip-reading [56] can be interpreted as examples of cross-modal processing that involve access to phonological spectrotemporal templates. Auditory learning Left PT activation in response to sign language in prelingually deaf individuals [57] is consistent with recruitment of the computational hub by an entirely different sensory modality: a striking example of plasticity. Our model predicts the bilateral deactivation of the PT specifically associated with short-term auditory learning that is demonstrated in normal subjects [58], as such training could lead to the establishment of stored templates for acoustic targets, improvements in computational efficiency and reductions in metabolic demands. Lateralization Lateralized PT activation during processing of language and musical stimuli (Table 1, Fig. 1) is not a specific feature of our computational hub model. However, the model could accommodate lateralization of processing determined by stimulus features [59,60] or lateralized downstream cognitive processing. Predictions Figure 3 illustrates a scheme for the identification and localization of single sound objects in space. One of the most challenging tasks for the auditory system is to execute this task for multiple objects, in the cocktail party effect (where we perceive and attend to the voice of one speaker when many speakers are present). We predict a crucial involvement of the PT in this task. Our model is based on the properties of local networks within the PT, and could be tested experimentally using single unit recording in animal homologues of PT, or depth-electrode studies in humans. One core feature of the model that might be examined directly in this way is the plasticity of unit responses to spectrotemporal patterns. References 1 Westbury, C.F. et al. (1999) Quantifying variability in the planum temporale: a probability map. Cereb. Cortex 9, Marshall, J.C. (2000) Planum of the apes: a case study. Brain Lang. 71, Keenan, J.P. et al. (2001) Absolute pitch and planum temporale. NeuroImage 14, Zatorre, R.J. et al. (1998) Functional anatomy of musical processing in listeners with absolute pitch and relative pitch. Proc. Natl. Acad. Sci. U. S. A. 95, Wise, R.J.S. et al. (2001) Separate neural subsystems within Wernicke s area. Brain 124, Bregman, A.S. (1990) Auditory Scene Analysis, MIT Press 7 de Charms, R.C. et al. (1998) Optimizing sound features for cortical neurons. Science 280, Nelken, I. et al. (1999) Responses of auditorycortex neurons to structural features of natural sounds. Nature 397, Schnupp, J.W.H. et al. (2001) Linear processing of spatial cues in primary auditory cortex. Nature 414, Galuske, R.A.W. et al. (2000) Interhemispheric asymmetries of the modular structure in human temporal cortex. Science 289, Galaburda, A. and Sanides, F. (1980) Cytoarchitectonic organisation of the human auditory cortex. J. Comp. Neurol. 190, Pandya, D.N. (1995) Anatomy of the auditory cortex. Rev. Neurol. 151, Howard, M.A. et al. (2000) Auditory cortex on the human posterior superior temporal gyrus. J. Comp. Neurol. 416, Tardif, E. and Clarke, S. (2001) Intrinsic connectivity in human auditory areas: a tracing study with DiI. Eur. J. Neurosci. 13, Karbe, H. et al. (1998) Cerebral networks and functional brain asymmetry: evidence from

292 Opinion TRENDS in Neurosciences Vol.25 No.7 July regional metabolic changes during word repetition. Brain Lang. 63, Warren, J.D. et al. (2002) Perception of sound source motion by the human brain. Neuron 34, Maeder, P.P. et al. (2001) Distinct pathways involved in sound recognition and localisation: a human fmri study. NeuroImage 14, Anourova, I. et al. (2001) Evidence for dissociation of spatial and nonspatial auditory information processing. NeuroImage 14, Rauschecker, J.P. and Tian, B. (2000) Mechanisms and streams for processing of what and where in auditory cortex. Proc. Natl. Acad. Sci. U. S. A. 97, Näätänen, R. et al. (2001) Primitive intelligence in the auditory cortex. Trends Neurosci. 24, Steinschneider, M. et al. (1999) Temporal encoding of the voice onset time phonetic parameter by field potentials recorded directly from human auditory cortex. J. Neurophysiol. 82, Liégeois-Chauvel, C. et al. (1999) Specialisation of left auditory cortex for speech perception in man depends on temporal coding. Cereb. Cortex 9, Bell, A.J. and Sejnowski, T.J. (1995) An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7, Attias, H. and Schreiner, C.E. (1998) Blind source separation and deconvolution: the dynamic component analysis algorithm. Neural Comput. 10, Stone, J.V. (2002) Independent component analysis: an introduction. Trends Cogn. Sci. 6, Wightman, F.L. and Kistler, D.J. (1989) Headphone simulation of free-field listening. II: Psychophysical validation. J. Acoust. Soc. Am. 85, Hofman, P.M. et al. (1998) Relearning sound localisation with new ears. Nat. Neurosci. 1, Rao, R.P. and Ballard, D.H. (1999) Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptivefield effects. Nat. Neurosci. 2, Friston, K.J. and Price, C.J. (2001) Dynamic representations and generative models of brain function. Brain Res. Bull. 54, Recanzone, G.H. (2000) Spatial processing in the auditory cortex of the macaque monkey. Proc. Natl. Acad. Sci. U. S. A. 97, Tian, B. et al. (2001) Functional specialization in rhesus monkey auditory cortex. Science 292, Baumgart, F. et al. (1999) A movement-sensitive area in auditory cortex. Nature 400, Lewis, J.W. et al. (2000) A comparison of visual and auditory motion processing in human cerebral cortex. Cereb. Cortex 10, Hall, D.A. et al. (2002) Spectral and temporal processing in human auditory cortex. Cereb. Cortex 12, Giraud, A-L. et al. (2000) Representation of the temporal envelope of sounds in the human brain. J. Neurophysiol. 84, Binder, J.R. et al. (2000) Human temporal lobe activation by speech and nonspeech sounds. Cereb. Cortex 10, Penhune, V.B. et al. (1998) Cerebellar contributions to motor timing: a PET study of auditory and visual rhythm reproduction. J. Cogn. Neurosci. 10, Griffiths, T.D. et al. (1999) A common neural substrate for the analysis of pitch and duration pattern in segmented sound? NeuroReport 18, Engelien, A. et al. (1995) The functional anatomy of recovery from auditory agnosia. A PET study of sound categorisation in a neurological patient and normal controls. Brain 118, Belin, P. et al. (2000) Voice-selective areas in human auditory cortex. Nature 403, Zatorre, R.J. et al. (1994) Neural mechanisms underlying melodic perception and memory for pitch. J. Neurosci. 14, Halpern, A.R. and Zatorre, R.J. (1999) When that tune runs through your head: a PET investigation of auditory imagery for familiar meolodies. Cereb. Cortex 9, Griffiths, T.D. (2000) Musical hallucinosis in acquired deafness. Phenomenology and brain substrate. Brain 123, Binder, J.R. et al. (1996) Function of the left planum temporale in auditory and linguistic processing. Brain 119, Giraud, A. et al. (2001) Cross-modal plasticity underpins language recovery after cochlear implantation. Neuron 30, Vouloumanos, A. et al. (2001) Detection of sounds in the auditory stream: event-related fmri evidence for differential activation to speech and nonspeech. J. Cogn. Neurosci. 13, Celsis, P. et al. (1999) Differential fmri responses in left posterior superior temporal gyrus and left supramarginal gyrus to habituation and change detection in syllables and tones. NeuroImage 9, Bischoff-Grethe, A. et al. (2000) Conscious and unconscious processing of nonverbal predictability in Wernicke s area. J. Neurosci. 20, Shtyrov, Y. et al. (2000) Discrimination of speech and of complex nonspeech sounds of different temporal structure in the left and right cerebral hemispheres. NeuroImage 12, McGuire, P.K. et al. (1996) Functional neuroanatomy of verbal self-monitoring. Brain 119, Hall, D.A. et al. (2000) Modulation and task effects in auditory processing measured using fmri. Hum. Brain Mapp. 10, BioMedNet Magazine 52 Hashimoto, R. et al. (2000) Functional differentiation in the human auditory and language areas revealed by a dichotic listening task. NeuroImage 12, Leinonen, L. et al. (1980) Functional properties of neurons in the temporo-parietal association cortex of awake monkey. Exp. Brain Res. 39, Howard, R.J. et al. (1996) A direct demonstration of functional specialisation within motion-related visual and auditory cortex of the human brain. Curr. Biol. 6, Nakada, T. et al. (2001) Planum temporale: where spoken and written language meet. Eur. Neurol. 46, Calvert, G.A. et al. (1997) Activation of auditory cortex during silent lipreading. Science 276, Petitto, L.A. et al. (2000) Speech-like cerebral activity in profoundly deaf people processing signed languages: implications for the neural basis of human language. Proc. Natl. Acad. Sci. U. S. A. 97, Jäncke, L. et al. (2001) Short term functional plasticity in the human auditory cortex: an fmri study. Cognit. Brain Res. 12, Schwartz, J. and Tallal, P. (1980) Rate of acoustic change may underlie hemispheric specialisation for speech perception. Science 207, Belin, P. and Zatorre, R.J. (2000) What, where and how in auditory cortex. Nat. Neurosci. 3, Thivard, L. et al. (2000) A cortical region sensitive to auditory spectral motion. NeuroReport 11, Tervaniemi, M. et al. (2000) Lateralized automatic auditory processing of phonetic versus musical information: a PET study. Hum. Brain Mapp. 10, Zatorre, R.J. et al. (1996) Hearing in the mind s ear a PET investigation of musical imagery and perception. J. Cogn. Neurosci. 8, Perry, D.W. et al. (1999) Localization of cerebral activity during simple singing. NeuroReport 10, Zatorre, R.J. et al. (1992) Lateralisation of phonetic and pitch discrimination in speech processing. Science 256, Jäncke, L. et al. (2002) Phonetic perception and the temporal cortex. NeuroImage 15, Middlebrooks, J.C. et al. (1994) A panoramic code for sound location by cortical neurons. Science 264, The online-only BioMedNet Magazine contains a range of topical articles currently available in Current Opinion and Trends journals, providing some of the finest material available on BioMedNet. It deals with matters of daily importance, such as careers, funding policies, current controversy and changing regulations in research. You can elect to receive the BioMedNet Magazine delivered directly to your address. Don t miss out! Register now at

293 The Journal of Neuroscience, July 2, (13): Behavioral/Systems/Cognitive Distinct Mechanisms for Processing Spatial Sequences and Pitch Sequences in the Human Auditory Brain J. D. Warren 1,2 and T. D. Griffiths 1,2 1 Wellcome Department of Imaging Neuroscience, Institute of Neurology, London, WC1N 3BG United Kingdom, and 2 Auditory Group, University of Newcastle Medical School, Newcastle-upon-Tyne, NE2 4HH United Kingdom Perception of the acoustic world requires the simultaneous processing of the acoustic patterns associated with sound objects and their location in space. In this functional magnetic resonance experiment, we investigated the human brain areas engaged in the analysis of pitch sequences and sequences of acoustic spatial locations in a paradigm in which both could be varied independently. Subjects were presented with sequences of sounds in which the individual sounds were regular interval noises with variable pitch. Positions of individual sounds were varied using a virtual acoustic space paradigm during scanning. Sound sequences with changing pitch specifically activated lateral Heschl s gyrus (HG), anterior planum temporale (PT), planum polare, and superior temporal gyrus anterior to HG. Sound sequences with changing spatial locations specifically activated posteromedial PT. These results demonstrate directly that distinct mechanisms for the analysis of pitch sequences and acoustic spatial sequences exist in the human brain. This functional differentiation is evident as early as PT: within PT, pitch pattern is processed anterolaterally and spatial location is processed posteromedially. These areas may represent human homologs of macaque lateral and medial belt, respectively. Key words: pitch; auditory space; functional imaging; human auditory cortex; sound; brain Introduction Considerable controversy surrounds the anatomical and functional organization of the human cortical auditory system (Cohen and Wessinger, 1999; Belin and Zatorre, 2000; Romanski et al., 2000; Middlebrooks, 2002; Zatorre et al., 2002). In nonhuman primates, distinct ventral what and dorsal where auditory processing streams have been proposed on electrophysiological grounds (Kaas and Hackett, 2000; Rauschecker and Tian, 2000; Tian et al., 2001). In humans, anatomical (Galaburda and Sanides, 1980; Rivier and Clarke, 1997; Galuske et al., 1999; Tardif and Clarke, 2001), functional imaging (Alain et al., 2001; Maeder et al., 2001; Warren et al., 2002), electrophysiological (Alain et al., 2001; Anourova et al., 2001) and lesion (Clarke et al., 2000) data are consistent with an anterior auditory cortical what pathway that processes sound object information and a posterior where pathway that processes spatial information. However, the extent and functional basis of any such separation of processing remains contentious (Cohen and Wessinger, 1999; Belin and Zatorre, 2000; Middlebrooks, 2002; Zatorre et al., 2002). Representative previous human functional imaging studies of auditory what and where processing are summarized in a supplemental table (available at It has recently been proposed that the human planum temporale (PT) plays a critical role in disambiguating the intrinsic properties of sounds from the acoustic correlates of spatial location, Received Dec. 6, 2002; revised April 23, 2003; accepted April 24, J.D.W. and T.D.G. are supported by the Wellcome Trust. Correspondence should be addressed to Dr. Timothy D. Griffiths, Auditory Group, University of Newcastle Medical School, Framlington Place, Newcastle-upon-Tyne NE2 4HH, UK. t.d.griffiths@ncl.ac.uk. Copyright 2003 Society for Neuroscience /03/ $15.00/0 before further processing of those specific attributes in distinct cortical areas (Griffiths and Warren, 2002). PT is a large region of auditory association cortex, occupying the superior temporal plane posterior to Heschl s gyrus (HG) (Westbury et al., 1999). PT is involved in processing many different types of sound patterns, including both intrinsic spectrotemporal features of sound objects and auditory spatial information (Griffiths and Warren, 2002). Taken together, the results of a number of functional imaging studies [summarized by Griffiths and Warren (2002)] suggest that distinct subregions for processing particular sound attributes may exist within human PT: however, its functional architecture has not been established (Recanzone, 2002). In this functional magnetic resonance imaging (fmri) experiment, we tested the hypothesis that there are distinct cortical substrates for processing pitch patterns and the location of sounds in space by comparing directly the processing of sequences of pitch and sequences of spatial positions. Specifically, we hypothesized that pitch sequences are processed in a network of areas including lateral HG, PT, and planum polare (PP) (Patterson et al., 2002), whereas spatial information is processed in a posterior network that includes PT and inferior parietal lobe (IPL) (Pavani et al., 2002; Warren et al., 2002; Zatorre et al., 2002). We predicted a common involvement of PT in both tasks and were interested specifically in the possibility that distinct subregions of PT may be associated with each task. The stimuli were sequences of sounds with temporal regularity and associated pitch [iterated ripple noise (IRN)] presented in virtual space. Like natural sound objects, these broadband stimuli can be localized accurately in external acoustic space. However, their associated pitch and spatial characteristics can be varied independently in a factorial experimental design.

294 5800 J. Neurosci., July 2, (13): Warren and Griffiths Space and Pitch in Human Auditory Cortex Figure 1. Schematic representation of the experimental paradigm. During scanning, four different combinations of sound sequences with fixed pitch or randomly changing pitch ( pitch) and fixed azimuthal location or randomly changing location ( location) were presented. For each sequence, the first and last elements were identical in both pitch and spatial location (0 in azimuth illustrated here: in the experiment, randomized 0, 90, 180, or 90 in azimuth). Each combination of sound sequences corresponded to a different condition with a distinct percept: 1, fixed pitch, fixed spatial location; 2, changing pitch, fixed spatial location; 3, fixed pitch, changing spatial location; 4, changing pitch, changing spatial location. Additional conditions used during scanning (not shown) were broadband noise sequences with fixed or changing spatial location, and silence. The use of musical notation here is purely symbolic. Pitch variations were random and based on a 10-note octave rather than the Western musical scale. For ease of illustration, short sound sequences with large spatial steps are shown; however, the actual sequences used during scanning comprised 23 or 25 elements with steps of 20, 30, or 40 between successive locations. Materials and Methods Stimuli were based on either IRN or fixed amplitude, random phase noise with passband 1 Hz to 10 khz, created digitally at a sampling rate of 44.1 khz. Stimuli were convolved with generic head-related transfer functions (HRTFs) (Wightman and Kistler, 1989) to create a percept of external location in virtual acoustic space. Sounds were combined in sequences containing either 25 or 23 elements in which the duration of each individual element was fixed at 250 msec with an intersound pause of 75 msec. The pitch of the IRN stimuli either remained fixed throughout the sequence or was varied randomly among the first six elements of a 10-note octave spanning Hz. Sounds were located at one of four initial spatial positions: 0, 90, 180, or 90 in azimuth. The spatial location of the sound either remained fixed or was varied randomly from element to element. Sequences with changing spatial location were generated from four different combinations of azimuthal positions: the step between successive azimuthal positions could be 20, 30, or 40 in size, and the order and direction (clockwise or counterclockwise) of steps was randomized. The pitch of the first and last element and the spatial location of the first and last element were constrained to be identical in any given sequence. The experimental paradigm is represented schematically in Figure 1. Subjects (five males, seven females) were aged All were righthanded. None had any history of hearing or neurological disorder, and all had normal structural MRI scans. All subjects gave informed consent, and the experiment was performed with the approval of the local Ethics Committee. During fmri scanning, stimuli were delivered using a custom electrostatic system ( at a sound pressure level of 70 db. Blood oxygenation level-dependent (BOLD) contrast images were acquired at 2 T (Siemens Vision, Erlangen, Germany) using gradient echo planar imaging in a sparse protocol (repetition time/echo time 12,000/40 msec) (Hall et al., 1999). Each volume comprised 48 contiguous 4 mm slices with an in-plane resolution of 3 3 mm. Seven stimulus conditions, each corresponding to a different type of sound sequence and a distinct percept, were used (Fig. 1): (1) IRN with fixed pitch and fixed spatial position (fixed pitch notes with fixed location in azimuth); (2) IRN with changing pitch and fixed spatial position (changing pitch notes at a fixed azimuthal location); (3) IRN with fixed pitch and changing spatial position (fixed pitch notes at a sequence of azimuthal locations); (4) IRN with changing pitch and changing spatial position (changing pitch notes at a sequence of azimuthal locations); (5) fixed amplitude random phase noise with fixed spatial position (a noise burst at a fixed azimuthal location); (6) fixed amplitude random phase noise with changing spatial position (a noise burst at a sequence of azimuthal locations); (7) silence. Subjects were pretested before scanning with examples of stimuli based on each generic HRTF to select the HRTF that gave the most reliable percept of an external sound source during scanning. All subjects perceived the stimuli used during scanning as originating from locations outside the head. In sequences during which spatial location varied, the percept was an instantaneous jump between consecutive positions. Sequences were presented in randomized order. Two hundred twenty-four brain volumes were acquired for each subject (16 volumes for each condition, in two sessions). Subjects were asked to attend to the sound sequences. To help maintain alertness, they were required to make a single button press with the right hand at the end of each sequence (25 element and 23 element sequences were presented in random order) and to fixate a cross piece at the midpoint of the visual axes. Each subject s ability to detect changes in pitch pattern, changes in spatial pattern, or simultaneous changes in both types of pattern was assessed psychophysically immediately after scanning using a twoalternative, forced-choice procedure. Subjects listened to pairs of sound sequences in which each sequence contained seven elements that varied either in pitch or spatial location or both simultaneously. The task was to detect a single difference in pitch or spatial pattern associated with changing one element between the members of each pair. Psychophysical test sequences were based on the same pitch and spatial parameters as those used during scanning; noise-based versions were also included. All subjects could easily detect sequences that differed only in pitch pattern (mean correct response rate 84%), sequences that differed only in spatial pattern (mean correct response rate 78%), and sequences that differed in both pitch and spatial pattern (mean correct response rate 78%). Oneway ANOVA did not show any effect of trial type on performance at the p 0.05 significance threshold. Imaging data were analyzed for the entire group and for each individual subject using statistical parametric mapping implemented in SPM99 software (http//: Scans were first realigned and normalized spatially (Friston et al., 1995) to the Montreal Neurological Institute (MNI) standard stereotactic space (Evans et al., 1993). Data were smoothed spatially with an isotropic Gaussian kernel of 8 mm full width at half maximum. Statistical parametric maps (SPMs) were generated by modeling the evoked hemodynamic response for the different stimuli as boxcars convolved with a synthetic hemodynamic response function in the context of the general linear model. In the group analysis, BOLD signal changes between conditions of interest were assessed using a random effects model that estimated the second level t statistic at a significance threshold of p 0.05 after false discovery rate correction for multiple comparisons (Genovese et al., 2002). Individual subject data were analyzed to further assess the anatomical variability of pitch and auditory spatial processing within the group. In the analysis of each individual subject, BOLD signal changes between conditions of interest were assessed by estimating the t statistic for each voxel at a significance threshold of p 0.05 after small volume correction taking the a priori anatomical hypotheses into account. For the pitch conditions, anatomical small volumes that included right and left lateral HG, PP, and PT were derived from the group mean normalized structural MRI brain volume and 95% probability maps for left and right human PT (Westbury et al., 1999). For the spatial conditions, anatomical small volumes were based on 95% probability maps for left and right human PT (Westbury et al., 1999).

295 Warren and Griffiths Space and Pitch in Human Auditory Cortex J. Neurosci., July 2, (13): Figure 2. Statistical parametric maps for contrasts of interest (group data). a, SPMs are shown as glass brain projections in sagittal, coronal, and axial planes. b, SPMs have been rendered on the group mean structural MRI brain image, normalized to the MNI standard sterotactic space (Evans et al., 1993). Tilted axial sections are shown at three levels parallel to the superior temporal plane: 0 mm (center), 2 mm, and 2 mm (insets). The 95% probability boundaries for left and right human PT are outlined (black) (Westbury et al., 1999). Sagittal sections of the left (x 56 mm) and right (x 62 mm) cerebral hemispheres are displayed below. All voxels shown are significant at the p 0.05 level after false discovery rate correction for multiple comparisons; clusters less than eight voxels in size have been excluded. Broadband noise (without pitch) compared with silence activates extensive bilateral superior temporal areas including medial Heschl s gyrus (HG) (b, center, yellow). In the contrasts between conditions with changing pitch and fixed pitch and between conditions with changing spatial location and fixed location, a masking procedure has been used to identify voxels activated only by pitch change (blue), only by spatial change (red), and by both types of change (magenta). The contrasts of interest activate distinct anatomical regions on the superior temporal plane. Pitch change (but not spatial location change) activates lateral HG, anterior PT, and planum polare (PP) anterior to HG, extending into superior temporal gyrus, whereas spatial change (but not pitch change) produces more restricted bilateral activation involving posterior PT. Within PT (b, axial sections), activation attributable to pitch change occurs anterolaterally, whereas activation attributable to spatial change occurs posteromedially. Only a small number of voxels within PT are activated both by pitch change and by spatial change. Results In the group random effects analysis, significant activation was demonstrated in each of the contrasts of interest at the p 0.05 voxel level of significance after false discovery rate correction for multiple comparisons. Broadband noise (without pitch) compared with silence produced extensive bilateral superior temporal activation, including medial HG (Fig. 2b, center). The contrasts between conditions with changing pitch and fixed pitch (main effect of pitch change) and between all conditions (both pitch and noise) with changing spatial location and fixed location (main effect of spatial change) produced specific activations restricted to distinct anatomical regions on the superior temporal plane (Fig. 2a,b). Pitch changes (but not spatial location changes) produced bilateral activation involving lateral HG, anterior PT, and PP anterior to HG, extending into superior temporal gyrus. Lateral HG activation lay outside the 95% probability boundaries for primary auditory cortex (PAC) as defined by Rademacher et al. (2001). In contrast, spatial location changes (but not pitch changes) produced bilateral activation involving posterior PT. Within PT (Fig. 2b), activation attributable to pitch change occurred anterolaterally, whereas activation attributable to spatial change occurred posteromedially. Local maxima in the superior temporal plane for each of the main effects are listed in Table 1. Within PT, local maxima for spatial change were clearly posterior bilaterally to those for pitch change. For pitch change, additional local maxima occurred anteriorly in right PP and left lateral HG. Although no local maxima occurred in left PP and right lateral HG, these regions were clearly also activated by pitch change (Fig. 2a,b). Only a small number of voxels within PT were activated by both pitch changes and spatial location changes (Fig. 2a,b). No

296 5802 J. Neurosci., July 2, (13): Warren and Griffiths Space and Pitch in Human Auditory Cortex Table 1. Local maxima of activation in the superior temporal plane for the main effects of pitch change and spatial change (group data) Coordinates (mm) Region Side x y z Z score Pitch change only Planum temporale L R Planum polare R Lateral Heschl s gyrus L Spatial change only Planum temporale L R Data are derived from a random effects analysis of the 12 subjects. All local maxima in the superior temporal plane are shown for voxels activated by pitch change but not by change in spatial location (pitch change only) and by change in spatial location but not by pitch change (spatial change only). Coordinates are in millimeters after transformation into standard MNI stereotactic space (Evans et al., 1993). A Z score 3.50 corresponds to p 0.05 after false discovery rate correction for multiple comparisons (Genovese et al., 2002). interactions were observed between the pitch and spatial change conditions. For both the main effect contrasts of interest, the group SPMs for left and right cerebral hemispheres were compared in a random effects analysis using a paired t test thresholded at the p 0.05 voxel level after small volume correction taking the a priori anatomical hypotheses into account. For the main effect of pitch, anatomical small volumes were based on right and left lateral HG, PP, and PT (derived from the group mean normalized structural MRI brain volume) and 95% probability maps for left and right human PT (Westbury et al., 1999); for the main effect of space, anatomical small volumes were based on 95% probability maps for left and right human PT (Westbury et al., 1999). The distributions of activation did not differ significantly between cerebral hemispheres for either pitch or spatial processing. Individual subject analyses (using a voxel significance threshold of p 0.05 after small volume correction) showed activation patterns similar to the group analysis. Pitch change produced local maxima within the prespecified region (contiguous areas in each hemisphere comprising lateral HG, PT, and PP) in 10 of 12 individual subjects. Changing spatial location produced local maxima within the prespecified region (PT in each hemisphere) in all individual subjects. Discussion This study has demonstrated distinct human brain substrates for the analysis of pitch sequences and acoustic spatial sequences in a single fmri paradigm. These substrates comprise secondary and association auditory cortical areas beyond PAC in medial HG (Rademacher et al., 2001). A bilateral anterior network of areas dedicated to the processing of pitch sequences includes lateral HG, anterior PT, PP, and superior temporal gyrus, whereas a bilateral posterior network dedicated to the processing of spatial sequences includes posteromedial PT. The present findings are consistent with proposed dual what and where processing pathways in the macaque (Kaas and Hackett, 2000; Rauschecker and Tian, 2000; Tian et al., 2001) and the increasing evidence for distinct anterior and posterior auditory networks emerging from human anatomical (Galaburda and Sanides, 1980; Rivier and Clarke, 1997; Galuske et al., 1999; Tardif and Clarke, 2001), functional imaging (Alain et al., 2001; Maeder et al., 2001; Warren et al., 2002), electrophysiological (Alain et al., 2001; Anourova et al., 2001), and lesion (Clarke et al., 2000) studies. In humans, the anterior network (including PP, anterior superior and middle temporal gyri, and superior temporal sulcus) has been implicated in the analysis (what) of many different types of spectrotemporal pattern, including simple spectral and temporal patterns (Griffiths et al., 1998b; Binder et al., 2000; Thivard et al., 2000; Zatorre and Belin, 2001; Hall et al., 2002; Patterson et al., 2002), musical melodies (Zatorre et al., 1994, 1996), vocal sounds (Belin et al., 2000), and speech (Zatorre et al., 1992; Scott et al., 2000; Vouloumanos et al., 2001; Wise et al., 2001). The posterior network including IPL is active in the spatial (where) analysis of both stationary (Alain et al., 2001) and moving (Baumgart et al., 1999; Warren et al., 2002) sounds. The present experiment has demonstrated distinct human auditory cortical mechanisms that are simultaneously and specifically engaged in processing different properties of sound sequences. The mechanism for processing pitch pattern is situated anteriorly, whereas the mechanism for processing spatial pattern is situated posteriorly. Bilateral activation of the hemispheric networks that process auditory spatial and pitch sequences is evident in the present study (Fig. 2). For both pitch processing and spatial sequence processing, the distributions of activation did not differ significantly between the left and right cerebral hemispheres. Previous studies of auditory spatial processing have suggested bilateral (Pavani et al., 2002; Warren et al., 2002) or right-lateralized (Baumgart et al., 1999) activation of PT. For the processing of pitch sequences and chords, a more consistent pattern of rightlateralized activation in superior temporal lobe areas beyond PAC has been shown in a number of studies (Zatorre et al., 1994; Tervaniemi et al., 2000; Patterson et al., 2002). The contrast between random pitch and fixed pitch elements in the study of Patterson et al. (2002) is closest to the pitch change contrast used here. Patterson et al. (2002) also found bilateral activation of lateral PT and PP, although the rightward asymmetry of activation demonstrated in that study was not evident in the present experiment. This study has shown that analysis of both pitch sequences and spatial sequences involves PT. Previous human functional imaging studies have indicated that PT is involved in the analysis of both the intrinsic spectrotemporal (Binder et al., 1996; Giraud et al., 2000; Thivard et al., 2000; Hall et al., 2002; Warren et al., 2002) and the spatial (Baumgart et al., 1999; Pavani et al., 2002; Warren et al., 2002) properties of many types of complex sounds (for review, see Griffiths and Warren, 2002). We have argued previously (Warren et al., 2002) that posteromedial PT activation is a neural correlate of the perception of acoustic space. In contrast, the network of parietal and frontal areas that have been activated inconsistently in previous studies of auditory spatial processing (Griffiths et al., 1998a, 2000; Baumgart et al., 1999; Bushara et al., 1999; Griffiths and Green, 1999; Weeks et al., 1999; Lewis et al., 2000; Alain et al., 2001; Maeder et al., 2001; Pavani et al., 2002; Warren et al., 2002; Zatorre et al., 2002) may have a role in auditory attention or (covert) motor preparation. The lack of an output task therefore may account for the absence of activation in this frontoparietal network in the present experiment. In this study, we have demonstrated that patterns of pitch and auditory spatial location are analyzed at different sites within human PT. Pitch information is processed anterolaterally, whereas spatial information is processed posteromedially. Such functional differentiation is not evident in medial HG, the site of PAC (Rademacher et al., 2001). Although we do not dismiss the possibility that neurons within PAC may process acoustic correlates of spatial position (Toronchuk et al., 1992), the present evidence suggests that the processing of intrinsic and spatial

297 Warren and Griffiths Space and Pitch in Human Auditory Cortex J. Neurosci., July 2, (13): sound properties diverges beyond PAC and as early as PT. These distinct functional subregions may correspond to the cytoarchitecturally distinct regions Te2 (medial) and Te3 (lateral) identified in the human posterior temporal plane (Morosan et al., 2001). Such a functional subdivision of human PT is consistent with anatomical and electrophysiological data in nonhuman primates. Auditory association cortices in humans and macaques share a number of cytoarchitectural features (Galaburda and Sanides, 1980). Functionally distinct medial (CM) and lateral (CL) belt areas have been described in the macaque posterior superior temporal plane (Tian et al., 2001). This region has been implicated in the analysis of sound source location (Leinonen et al., 1980; Recanzone, 2000) and proposed as the origin of an auditory dorsal stream for processing spatial information (Rauschecker and Tian, 2000). However, a certain subpopulation of neurons in area CL responds both to the spatial location of complex sounds and to specific call sounds (Tian et al., 2001). This observation and the present human evidence suggest that auditory association cortex may have a similar functional organization in humans and nonhuman primates. There is relative (rather than absolute) selectivity of medial belt areas for processing spatial information and lateral belt areas for processing object information. However, the electrophysiological properties of the medial portion of the posterior superior temporal plane are technically difficult to study in both humans and nonhuman primates. We therefore would hesitate to suggest a precise functional or anatomical homology between macaque CM and CL, human Te2 and Te3, and the posteromedial and anterolateral PT functional subregions in the present study. The controversy surrounding the existence of dual what and where human auditory processing streams (Middlebrooks, 2002) was a major motivation for the present experiment. No account has satisfactorily reconciled the evidence, on the one hand, for a duality of processing streams and, on the other hand, for their mutual interdependence (Middlebrooks, 2002; Zatorre et al., 2002). On the basis of the present evidence, we propose a crucial role for human PT in gating auditory information between the two streams. Previously, we have hypothesized (Griffiths and Warren, 2002) that human PT acts a computational hub that is able to disambiguate object from spatial information in complex sounds. According to this generative model, in performing its computations, PT both accesses learned representations in higher order cortical areas and also gates spatial and objectrelated information to those higher areas. The present study refines our earlier model of PT operation in two ways: it suggests anatomically distinct spatial (posteromedial) and object (anterolateral) processing mechanisms within PT and distinct communication between these and other cortical areas. Acoustic spatial information is processed in a well defined region of the posterior superior temporal plane, whereas the areas that process object properties (pitch patterns) are distributed along the anteroposterior axis of the superior temporal lobe, including both the posterior temporal plane and anterior auditory areas. According to our model of human PT function, deconvolution in the posterior superior temporal plane will yield spatial and object information for further processing in distinct pathways. However, we do not exclude the possibility, suggested by macaque work (Rauschecker and Tian, 2000), that there may be other direct inputs to the distributed object identification (what) network from PAC or thalamus. The anterior posterior distribution of object processing in our data is consistent with macaque electrophysiology (Tian et al., 2001). Specifically, object specificity in the macaque defined using a range of animal calls is present in both anterior and posterior belt areas but is shown in a smaller proportion of neurons in the posterior belt. We suggest that in both humans and nonhuman primates there are mechanisms for processing the spatial and object properties of complex sounds in different subregions of the posterior temporal plane and that these mechanisms access distinct cortical areas. References Alain C, Arnott SR, Hevenor S, Graham S, Grady CL (2001) What and where in the human auditory system. Proc Natl Acad Sci USA 98: Anourova I, Nikouline VV, Ilmoniem RJ, Hotta J, Aronen HJ, Carlson S (2001) Evidence for dissociation of spatial and nonspatial auditory information processing. NeuroImage 14: Baumgart F, Gaschler-Markefski B, Woldorff MG, Heinze HJ, Scheich H (1999) A movement-sensitive area in auditory cortex. Nature 400: Belin P, Zatorre RJ (2000) What, where and how in auditory cortex. Nat Neurosci 3: Belin P, Zatorre RJ, Lafaille P, Ahad P, Pike B (2000) Voice-selective areas in human auditory cortex. Nature 403: Binder JR, Frost JA, Hammeke TA, Rao SM, Cox RW (1996) Function of the left planum temporale in auditory and linguistic processing. Brain 119: Binder JR, Frost JA, Hammeke TA, Bellgowan PSF, Springer JA, Kaufman JN, Possing ET (2000) Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex 10: Bushara KO, Weeks RA, Ishii K, Catalan MJ, Tian B, Rauschecker JP, Hallett M (1999) Modality-specific frontal and parietal areas for auditory and visual spatial localization in humans. Nat Neurosci 2: Clarke S, Bellmann A, Meuli RA, Assal G, Steck AJ (2000) Auditory agnosia and auditory spatial deficits following left hemispheric lesions: evidence for distinct processing pathways. Neuropsychologia 38: Cohen YE, Wessinger CM (1999) Who goes there? Neuron 24: Evans AC, Collins DL, Mills SR, Brown RD, Kelly RL, Peters TM (1993) 3D statistical neuroanatomical models from 305 MRI volumes. IEEE Nucl Sci Symp Med Imag Conf Proc IEEE 108: Friston KJ, Ashburner J, Frith CD, Poline JB, Heather JD, Frackowiak RSJ (1995) Spatial registration and normalisation of images. Hum Brain Mapp 2: Galaburda A, Sanides F (1980) Cytoarchitectonic organization of the human auditory cortex. J Comp Neurol 190: Galuske RAW, Schuhmann A, Schlote W, Bratzke H, Singer W (1999) Interareal connections in the human auditory cortex. NeuroImage 9:S994. Genovese CR, Lazar NA, Nichols T (2002) Thresholding of statistical maps in functional neuroimaging using the false discovery rate. NeuroImage 15: Giraud AL, Lorenzi C, Ashburner J, Wable J, Johnsrude I, Frackowiak R, Kleinschmidt A (2000) Representation of the temporal envelope of sounds in the human brain. J Neurophysiol 84: Griffiths TD, Green GGR (1999) Cortical activation during perception of a rotating wide-field acoustic stimulus. NeuroImage 10: Griffiths TD, Warren JD (2002) The planum temporale as a computational hub. Trends Neurosci 25: Griffiths TD, Rees G, Rees A, Green GGR, Witton C, Rowe D, Büchel C, Turner R, Frackowiak RSJ (1998a) Right parietal cortex is involved in the perception of sound movement in humans. Nat Neurosci 1: Griffiths TD, Büchel C, Frackowiak RSJ, Patterson RD (1998b) Analysis of temporal structure in sound by the human brain. Nat Neurosci 1: Griffiths TD, Green GGR, Rees A, Rees G (2000) Human brain areas involved in the analysis of auditory movement. Hum Brain Mapp 9: Hall DA, Haggard MP, Akeroyd MA, Palmer AR, Summerfield AQ, Elliott MR, Gurney EM, Bowtell RW (1999) Sparse temporal sampling in auditory fmri. Hum Brain Mapp 7: Hall DA, Johnsrude IS, Haggard MP, Palmer AR, Akeroyd MA, Summefield AQ (2002) Spectral and temporal processing in human auditory cortex. Cereb Cortex 12: Kaas JH, Hackett TA (2000) Subdivisions of auditory cortex and processing streams in primates. Proc Natl Acad Sci USA 97: Leinonen L, Hyvärinen J, Sovijärvi ARA (1980) Functional properties of

298 5804 J. Neurosci., July 2, (13): Warren and Griffiths Space and Pitch in Human Auditory Cortex neurons in the temporo-parietal association cortex of awake monkey. Exp Brain Res 39: Lewis JW, Beauchamp MS, DeYoe EA (2000) A comparison of visual and auditory motion processing in human cerebral cortex. Cereb Cortex 10: Maeder PP, Meuli RA, Adriani M, Bellmann A, Fornari E, Thiran JP, Pittet A, Clarke S (2001) Distinct pathways involved in sound recognition and localization: a human fmri study. NeuroImage 14: Middlebrooks JC (2002) Auditory space processing: here, there or everywhere? Nat Neurosci 5: Morosan P, Rademacher J, Schleicher A, Amunts K, Schormann T, Zilles K (2001) Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into a spatial reference system. NeuroImage 13: Patterson RD, Johnsrude IS, Uppenkamp S, Griffiths TD (2002) The processing of temporal pitch and melody information in auditory cortex. Neuron 36: Pavani F, Macaluso E, Warren JD, Driver J, Griffiths TD (2002) A common cortical substrate activated by horizontal and vertical sound movement in the human brain. Curr Biol 12: Rademacher J, Morosan P, Schormann T, Schleicher A, Werner C, Freund HJ, Zilles K (2001) Probabilistic mapping and volume measurement of human primary auditory cortex. NeuroImage 13: Rauschecker JP, Tian B (2000) Mechanisms and streams for processing of what and where in auditory cortex. Proc Natl Acad Sci USA 97: Recanzone G (2002) Where was that? human auditory spatial processing. Trends Cogn Sci 6: Recanzone GH (2000) Spatial processing in the auditory cortex of the macaque monkey. Proc Natl Acad Sci USA 97: Rivier F, Clarke S (1997) Cytochrome oxidase, acetylcholinesterase and NADPH-diaphorase staining in human supratemporal and insular cortex: evidence for multiple auditory areas. NeuroImage 6: Romanski LM, Tian B, Fritz JB, Miskin M, Goldman-Rakic PS, Rauschecker JP (2000) Reply to what, where and how in auditory cortex. Nat Neurosci 3:966. Scott SK, Blank CC, Rosen S, Wise RJS (2000) Identification of a pathway for intelligible speech in the left temporal lobe. Brain 123: Tardif E, Clarke S (2001) Intrinsic connectivity of human auditory areas: a tracing study with DiI. Eur J Neurosci 13: Tervaniemi M, Medvedev SV, Alho K, Pakhomov SV, Roudas MS, Van Zuijen TL, Näätänen R (2000) Lateralized automatic auditory processing of phonetic versus musical information: a PET study. Hum Brain Mapp 10: Thivard L, Belin P, Zilbovicius M, Poline J-B, Samson Y (2000) A cortical region sensitive to auditory spectral motion. NeuroReport 11: Tian B, Reser D, Durham A, Kustov A, Rauschecker JP (2001) Functional specialization in rhesus monkey auditory cortex. Science 292: Toronchuk JM, Stumpf E, Cynader MS (1992) Auditory cortex neurons sensitive to correlates of auditory motion: underlying mechanisms. Exp Brain Res 88: Vouloumanos A, Kiehl KA, Werker JF, Liddle PF (2001) Detection of sounds in the auditory stream: event-related fmri evidence for differential activation to speech and nonspeech. J Cogn Neurosci 13: Warren JD, Zielinski BA, Green GGR, Rauschecker JP, Griffiths TD (2002) Perception of sound-source motion by the human brain. Neuron 34: Weeks RA, Aziz-Sultan A, Bushara KO, Tian B, Wessinger CM, Dang N, Rauschecker JP, Hallett M (1999) A PET study of human auditory spatial processing. Neurosci Lett 262: Westbury CF, Zatorre RJ, Evans AC (1999) Quantifying variability in the planum temporale: a probability map. Cereb Cortex 9: Wightman FL, Kistler DJ (1989) Headphone simulation of free-field listening. I: Stimulus synthesis. J Acoust Soc Am 85: Wise RJS, Scott SK, Blank SC, Mummery CJ, Murphy K, Warburton EA (2001) Separate neural subsystems within Wernicke s area. Brain 124: Zatorre RJ, Belin P (2001) Spectral and temporal processing in human auditory cortex. Cereb Cortex 11: Zatorre RJ, Evans AC, Meyer E, Gjedde A (1992) Lateralization of phonetic and pitch discrimination in speech processing. Science 256: Zatorre RJ, Evans AC, Meyer E (1994) Neural mechanisms underlying melodic perception and memory for pitch. J Neurosci 14: Zatorre RJ, Halpern AR, Perry DW, Meyer E, Evans AC (1996) Hearing in the mind s ear: a PET investigation of musical imagery and perception. J Cogn Neurosci 8: Zatorre RJ, Bouffard M, Ahad P, Belin P (2002) Where is where in the human auditory cortex? Nat Neurosci 5:

299 Separating pitch chroma and pitch height in the human brain J. D. Warren*, S. Uppenkamp, R. D. Patterson, and T. D. Griffiths* *Wellcome Department of Imaging Neuroscience, Institute of Neurology, Queen Square, London WCIN 3BG, United Kingdom; Auditory Group, University of Newcastle Medical School, Newcastle-upon-Tyne NE2 4HH, United Kingdom; and Centre for the Neural Basis of Hearing, Department of Physiology, University of Cambridge, Cambridge CB2 2EG, United Kingdom Edited by Marcus E. Raichle, Washington University School of Medicine, St. Louis, MO, and approved June 27, 2003 (received for review February 4, 2003) Musicians recognize pitch as having two dimensions. On the keyboard, these are illustrated by the octave and the cycle of notes within the octave. In perception, these dimensions are referred to as pitch height and pitch chroma, respectively. Pitch chroma provides a basis for presenting acoustic patterns (melodies) that do not depend on the particular sound source. In contrast, pitch height provides a basis for segregation of notes into streams to separate sound sources. This paper reports a functional magnetic resonance experiment designed to search for distinct mappings of these two types of pitch change in the human brain. The results show that chroma change is specifically represented anterior to primary auditory cortex, whereas height change is specifically represented posterior to primary auditory cortex. We propose that tracking of acoustic information streams occurs in anterior auditory areas, whereas the segregation of sound objects (a crucial aspect of auditory scene analysis) depends on posterior areas. Auditory scientists define pitch as the perceptual correlate of acoustic frequency: a single physical dimension along which musical notes can be ordered from low to high (1). However, humans perceive the notes of the scale as repeating once per octave; accordingly, music psychologists represent pitch as a helix (Fig. 1a) with a circular dimension of pitch chroma and a vertical dimension of pitch height (2 5). The function of these two pitch dimensions is illustrated when the same melody is sung by a male or female voice or played by a violin or cello. The vocal cords of women vibrate faster than those of men, and the strings of a violin vibrate faster than those of a cello. These physical differences correspond to differences in pitch height that contribute to our perception that women s voices are higher than men s, and violins are higher than cellos. Note this distinction is not based on pitch chroma, because both voices or instruments can produce the full range of chromas; it is an average pitch height difference that is more properly associated with the perception that one source is higher than another. Whereas pitch chroma is used in tracking the information conveyed by a particular sound source, pitch height is used in the segregation of sources. This study describes a functional MRI (fmri) experiment designed to establish whether separate mechanisms for processing the two pitch dimensions exist in the human brain. Previous human fmri experiments have shown that pitch information is processed in regions beyond primary auditory cortex (PAC). PAC is located in the medial portion of Heschl s gyrus (HG) (6); musical melodies (7 9) and speech information (10, 11) activate areas that are lateral and anterior to PAC. Other functional imaging experiments indicate that the human planum temporale (PT) (12) posterior to HG is involved in auditory object segregation (13). However, the chroma and height dimensions of pitch have not been separated in previous functional imaging studies. We therefore designed the present fmri experiment to determine whether the helical model of pitch perception is reflected in the organization of the human brain. We tested the specific hypothesis that pitch chroma changes (auditory information streams) are processed in anterior auditory regions, and pitch height changes (changes in sound source identity) are processed in posterior auditory regions. Methods Bases for the Manipulation of Pitch Dimensions. In this experiment, we used stimuli in which pitch chroma and pitch height could be varied independently. The stimuli were harmonic complexes in which chroma and height could be varied continuously, whereas the total energy and spectral region remained fixed (Fig. 1b; examples are published as.wav files in the supporting information on the PNAS web site, (14, 15). The standard stimulus with all harmonics of the fundamental frequency, f 0 (Fig. 1b Top), had the pitch f 0 by definition. The pitch chroma of the stimuli was altered by varying f 0 in semitone steps as in the chromatic musical scale; this corresponds to motion around the pitch helix (red line in Fig. 1a). The pitch height was varied independently of chroma by reducing the amplitude of all odd harmonics in the complex (Fig. 1b Middle). If the odd harmonics of a harmonic complex are attenuated between 0 and 40 db, relative to baseline intensity, the new tone has the same pitch chroma, f 0, but is perceived as being higher in pitch; this change occurs along the pitch height dimension (blue line in Fig. 1a). When the odd harmonics are attenuated completely pitch height reaches the octave, 2f 0 (Fig. 1b Bottom), and repeating the attenuation process over successive octaves produces continuous pitch height changes without concomitant chroma changes. Using this procedure, we were able to synthesize sequences of notes in which chroma and or height could be independently varied between notes in the sequence (examples are available in the supporting information on the PNAS web site). Relationship Between Pitch Height, Tone Height, and Timbre. There are at least three methods for producing a sequence of harmonic sounds that all have the same pitch chroma but which are heard to rise along a perceptual dimension as the sequence progresses (14). The rise can be achieved by varying the spectral envelope of the sounds or by changing the spectral fine structure (either the intensity or the phase of alternate harmonics). Such manipulations are all related to the type of increase one hears when the speaker changes from a man to a woman, or the instrument changes from a cello to a violin. All three manipulations have been described as producing an increase in tone height. Here, we consider tone height in a more restricted sense to refer to manipulation of the spectral envelope, which is more closely associated with timbre perception. In contrast, we use pitch height in this experiment to refer to a manipulation of spectral fine structure that is more closely associated with pitch perception. Tone height manipulations associated with timbre are related to the size information in natural sounds (16). Women have This paper was submitted directly (Track II) to the PNAS office. Abbreviations: fmri, functional MRI; HG, Heschl s gyrus; PAC, primary auditory cortex; PP, planum polare; PT, planum temporale; f 0, fundamental frequency. To whom correspondence should be addressed. t.d.griffiths@ncl.ac.uk PNAS August 19, 2003 vol. 100 no cgi doi pnas

300 Psychophysical Effect of Pitch Height Manipulation. The effect of attenuating the odd components of a harmonic series on the pitch height of the sound was originally investigated in a scaling experiment (14, 15). We performed a new pitch height discrimination experiment to confirm that the stimulus manipulations used in the present study could be ordered continuously along the dimension of pitch height and to determine the resolution of discrimination along this dimension. In a two-interval twoalternative forced-choice task, three normal subjects were presented with a standard stimulus and a test stimulus in which the odd harmonics were attenuated more than the standard; the task was to choose the stimulus with the higher pitch height. Notes all had the same pitch chroma (80 Hz), duration (200 ms), fre- Fig. 1. (a) The pitch helix. The musical scale is wrapped around so that each circuit (red) is an octave. The equivalent change in pitch height with fixed chroma is shown (blue). (b) Examples of sounds with changing pitch height. Each of these harmonic complexes (h1, h2, h3) has a flat spectral envelope in the frequency band 0 4 khz. In h1 (Top), all harmonics of the fundamental, f 0, have equal amplitude; in h2 (Middle), the odd harmonics are attenuated by 10 db, producing a large increase in pitch height without changing pitch chroma; in h3 (Bottom), the odd harmonics are completely attenuated, producing a one-octave rise in pitch height with the same chroma (2f 0 ). shorter vocal tracts than men and, as a result, their formant frequencies are higher. Violins are smaller than cellos and, as a result, their resonant peaks occur at higher frequencies. The important point is that these tone height manipulations are associated with a dimension of timbre perception that is separate from pitch height, and the manipulations in the current experiments do not involve tone height in this sense. Pitch height manipulations are also related to the size information in natural sounds. These manipulations involve attenuation of the odd harmonics, as described above, or a fixed shift in the phase of either the odd or the even harmonics. The phase manipulations are noteworthy inasmuch as they have no effect on the power spectrum of the stimulus, but they still produce a rise the perceived pitch (14). This report is primarily concerned with the perception of pitch height rather than tone height and the location of pitch height processing rather than tone height processing in auditory cortex. It is the case that manipulation of the fine spectral structure of sounds can also affect the timbre of a sound. For example, the characteristic timbre of a clarinet is partly due to the fact that the even harmonics are attenuated relative to the odd harmonics, which imparts a characteristic hollowness to the sound. In the current experiment, however, it was the odd harmonics that were attenuated (relative to the even harmonics), which has much less effect on the timbre of the sound. In summary, the manipulation of the spectrotemporal structure of the stimulus can have two perceptual effects, but it is the effect on pitch height that is the focus of the present paper. In the next section, we demonstrate ordering effects that are most parsimoniously explained in terms of a pitch dimension. Fig. 2. Psychometric functions for pitch height. The psychometric function shows the dependence of the proportion of correct responses on the size of the change in pitch height. Psychometric functions for three subjects were derived from a two-interval two-alternative forced-choice experiment in which subjects were asked to detect which note is higher. The ordinate shows the proportion of correct subject responses, where 50% is equivalent to performance at chance; the abscissa shows the attenuation of odd harmonics in the test stimulus relative to baseline intensity. The fundamental frequency, f 0, was fixed at 80 Hz throughout the experiment (see Methods). The psychometric functions are based on change in attenuation of odd harmonics relative to standards in which the odd harmonics have fixed attenuation of 0 db (Left), 6dB(Center), and 12 db (Right). Each data point is based on at least 60 trials. Weibull functions were fitted by using maximum likelihood estimation implemented in MATLAB [Mathworks (Natick, MA) (fitting software at http: bootstrap-software.org psignifit)]. The 75% threshold is defined as the attenuation value at which subjects achieve a score of 75% correct (halfway between chance and ceiling performance); the 75% thresholds and 95% confidence intervals for each threshold shown ( ) were derived by using bootstrapping with 999 simulations (17, 18). To obtain 95% confidence intervals, the bootstrapping procedure used a Monte Carlo method for estimating the variability of the fitted psychometric function. All subjects heard an increase in pitch height from the standard when the odd harmonics were attenuated by 1 db in the test stimulus, and the pitch height threshold is uniform along the dimension. NEUROSCIENCE Warren et al. PNAS August 19, 2003 vol. 100 no

301 quency region (f 0 to 4 khz), and energy (sound pressure level 70 db). Three different standard stimuli were used for each listener where the odd harmonics had a fixed attenuation of 0, 6, or 12 db. The results are presented in Fig. 2. The left-hand psychometric functions show how discrimination increased with attenuation for the standard with 0-dB attenuation of the odd harmonics. All three listeners had thresholds of 1 db of attenuation and achieved near-ceiling performance at just over 2 db of attenuation. The central and right-hand psychometric functions show that pitch height discrimination remained excellent and essentially uniform when the standard stimulus had fixed attenuation of odd harmonics by 6 or 12 db. The experiment was performed without feedback and required essentially no training, indicating that the pitch height cue is stable and the direction of higher is consistent across listeners. Stimulus Details. In the fmri experiment, sounds were either pitch-producing harmonic complexes or broadband Gaussian noise (Fig. 1b; examples are published as supporting information on the PNAS web site). All stimuli were created digitally at a sampling rate of 44.1 khz. Total energy and passband (0 4 khz) were fixed for all stimuli. All sound sequences were 8 sec in duration; each sequence was composed of 40 individual sounds, each 200 ms in duration. The harmonic complexes were in cosine phase with components ranging from f 0 to 4 khz (Fig. 1b). Pitch chroma was varied randomly across one octave ( Hz) of the chromatic musical scale by varying f 0 from note to note in the sequence, in semitone steps. Pitch height was also varied randomly across approximately one octave, by attenuating the amplitude of all odd harmonics (Fig. 1b Middle) from note to note, in 2-dB steps. The note with the lowest pitch height had fundamental f 0 with 10-dB attenuation of the odd harmonics of f 0 ; the note with the highest pitch height had fundamental 2f 0 with 10-dB attenuation of the odd harmonics of 2f 0. The intervening pitch height steps had 12-, 14-, 16-, 18-, and 20-dB attenuation of the odd harmonics of f 0 and 0-, 2-, 4-, 6-, and 8-dB attenuation of the odd harmonics of 2f 0. In the stimuli with changing pitch height, chroma values were lowered by half an octave. The total range of subjective pitch change across a sequence was therefore approximately one octave, with either chroma variation or height variation, and the maximum overall pitch range was one-and-a-half octaves if both chroma and height variation occurred together. Scanning Protocol. Ten subjects aged (six males, four females; nine right handed, one left handed) participated in the fmri experiment. None had any history of hearing or neurological disorder, and all had normal structural MRI scans. All subjects gave informed consent, and the experiment was carried out with the approval of the Institute of Neurology Ethics Committee, London. Before scanning, subjects were asked to pay attention to the sound sequences; to help maintain alertness, they were required to make a single button press at the end of each broadband noise sequence by using a button box positioned beneath the right hand and to fixate a cross in the middle of the visual field. There was no active auditory discrimination task. During scanning, six stimulus conditions were used, corresponding to six types of sound sequence: (i) fixed pitch chroma, fixed pitch height; (ii) changing pitch chroma, fixed pitch height; (iii) changing pitch height, fixed pitch chroma; (iv) changing pitch chroma, changing pitch height; (v) broadband noise without pitch; and (vi) silence. The order of conditions was randomized. Stimuli were delivered by using a custom electrostatic system ( caf soundsystem) at a sound pressure level of 70 db. Each sound sequence was presented for a period of 8 sec, after which brain activity was estimated by the fmri blood oxygen-level-dependent response at 2 T (Siemens Vision, Erlangen, Germany) by using gradient echo planar imaging in a sparse acquisition protocol (time to repeat time to echo, TR TE 12, ms) (19). One hundred ninety-two brain volumes were acquired for each subject (32 volumes for each condition) in two sessions. Each brain volume comprised 48 contiguous transverse slices with an in-plane resolution of 3 3 mm. After scanning, all subjects underwent two alternative forced Fig. 3. Statistical parametric maps for the group. For each contrast (indicated below panels), activated voxels are rendered on the normalized group mean structural MRI in an axial section tilted 0.5 radians to include much of the surface of the superior temporal plane. The statistical criterion was P 0.05 corrected for multiple comparisons across the whole brain volume. The 90% probability boundaries for PAC (6) are outlined (black). (a) Broadband noise contrasted with silence (noise silence, green) activates extensive bilateral superior temporal areas including both medial and lateral HG. The pitch-producing stimuli contrasted with noise (pitch noise, lilac) produce more restricted bilateral activation in lateral HG, PP, and PT. (b) Pitch chroma change contrasted with fixed chroma (all chroma, red) activates bilateral areas in lateral HG, PP, and anterolateral PT. (c) Pitch height change contrasted with fixed height (all height, blue) activates bilateral areas in lateral HG and anterolateral PT. (d) Voxels in b and c activated both by pitch chroma change and pitch height change have been exclusively masked. Pitch chroma change but not height change ( chroma only, red) activates bilateral areas anterior to HG in PP; pitch height change but not chroma change ( height only, blue) activates bilateral areas in posterior PT. These areas represent distinct brain substrates for processing the two musical dimensions of pitch. The relative magnitude of the blood oxygen-level-dependent signal change in anterior and posterior areas is shown for each of the contrasts of interest (Right). The height of the histogram columns represents the mean size of effect (signal change) relative to global mean signal for the contrasts chroma-only (red) and height-only (blue) at the peak voxels for each contrast in the right hemisphere; vertical bars represent the standard error of the mean size of effect. The histograms demonstrate opposite patterns of pitch chroma and pitch height processing in anterior and posterior auditory areas cgi doi pnas Warren et al.

302 choice psychophysics to determine thresholds for detection of chroma and height changes in the sound sequences used during image acquisition. During scanning, the minimum pitch chroma and pitch height steps were one semitone and 2 db, respectively. All subjects in the fmri experiment could readily detect the changes in pitch chroma and pitch height and distinguish chroma sequences and height sequences; all had a threshold for detection of pitch chroma change less than one semitone and a threshold for pitch height change 2 db. The sound sequences with changing pitch height were perceived by subjects as a disrupted auditory stream. Image Analysis. A group analysis for all subjects was carried out by using statistical parametric mapping implemented in SPM99 software ( spm). Scans were first realigned and spatially normalized (20) to the Montreal Neurological Institute standard stereotactic space (21). Data were spatially smoothed with an isotropic Gaussian kernel of 8 mm full width at half maximum. Statistical parametric maps were generated by modeling the evoked hemodynamic response for the different stimuli as boxcars convolved with a synthetic hemodynamic response function in the context of the general linear model. A fixed-effects model was used to assess differences in blood flow between conditions of interest (i.e., areas in which pitch chroma change and pitch height change produced additional activation to that produced by the fixed chroma and fixed height baseline conditions). The t statistic was estimated for each voxel at a significance threshold of P 0.05 after correction for multiple comparisons across the whole brain volume according to Gaussian random field theory. Individual analyses were also carried out for each subject by using the same preprocessing parameters and statistical model and assessed by using a significance threshold of P 0.05 after small volume correction taking the a priori anatomical hypotheses into account. For the contrast between changing and fixed chroma conditions, anatomical small volumes were derived from the group mean structural MRI brain; these small volumes comprised left and right lateral HG and planum polare (PP). For the contrast between changing and fixed height conditions, anatomical small volumes were based on 95% probability maps for the left and right human PT (12). Results Group Analysis. In the group analysis, significant activation was demonstrated in each of the contrasts of interest at the P 0.05 voxel level of significance after correction for multiple comparisons across the entire brain volume. The contrast between broadband noise and silence produced extensive bilateral superior temporal activation, including medial HG and parts of PT (Fig. 3a, green) that was largely symmetric. The contrast between pitch conditions and noise produced more restricted bilateral activation in the lateral part of HG and PT and extending into PP (Fig. 3a, lilac). The contrasts between changing and fixed pitch chroma (Fig. 3b, red) and between changing and fixed pitch height (Fig. 3c, blue) produced common bilateral activation in lateral HG and anterolateral PT. Masking was then applied to exclude voxels activated by change both in pitch chroma and pitch height (Fig. 3d). Brain areas activated only by changes in pitch chroma were distinct from those activated only by changes in pitch height; pitch chroma change (Fig. 3d, red) produced additional activation extending anteriorly from HG into PP, whereas pitch height change (Fig. 3d, blue) produced additional activation extending posteriorly in PT. Activation was bilateral and slightly asymmetric; the activity in the right hemisphere was slightly more anterior than that in the left hemisphere. Local maxima of activation in the superior temporal plane for the group are listed in Table 1. The relative magnitude of the mean size of effect (change in blood oxygen-leveldependent signal) in each of the contrasts of interest (Fig. 3d Table 1. Local activation peaks: Group data Contrast Region Side Peak, mm x y z Z score chroma only PP Left Right HG Left height only PT Left Right Coordinates for local peaks of activation in the superior temporal plane identified for the group. Coordinates correspond to voxels activated by pitch chroma change but not by pitch height change ( chroma only) or by pitch height change but not by pitch chroma change ( height only). Three-figure coordinates are in millimeters after transformation into standard MNI stereotactic space (21). Z score 4.66 corresponds to P 0.05 after correction for multiple comparisons across the entire brain volume. Right) shows the opposite pattern for pitch chroma and pitch height processing in anterior and posterior auditory areas. Individual Analyses. Individual subject analyses were carried out to determine whether the anatomical distinction between the re- Fig. 4. Statistical parametric maps for individual subjects. Activated voxels (P uncorrected) are rendered on each individual s structural MRI. The axial section is tilted to run along the superior temporal plane as in Fig. 3, and the contrasts and color key are the same as in Fig. 3 a and d. Bilateral areas including medial HG are activated in the contrast between broadband noise and silence (green). After exclusive masking of voxels activated by both pitch chroma change and pitch height change, the two pitch dimensions show distinct activation patterns in most individuals: pitch chroma change (but not pitch height change) activates mainly areas anterior to HG on the PP (red); pitch height change (but not pitch chroma change) activates mainly PT (blue). NEUROSCIENCE Warren et al. PNAS August 19, 2003 vol. 100 no

303 gions specific for pitch chroma and height processing in the group was also evident in the individual data. Fig. 4 presents the individual results for the same axial slice as in Fig. 3; the pattern of activation in individuals was very similar to that of the group. An analysis was performed by using anatomical volumes of interest for lateral HG, PP, and PT specified a priori (see Methods). As in the group analysis, voxels activated both by pitch chroma change and pitch height change were exclusively masked to identify voxels specifically activated by pitch chroma change or pitch height change. There was significant activation in each contrast in all subjects at the P 0.05 voxel level of significance after correction for the specified volume. For pitch chroma change, significant local maxima occurred in the prespecified volume involving lateral HG and PP in every subject. For pitch height change, local maxima occurred in the prespecified volume involving PT in every subject. Coordinates of local maxima for all individuals are in Table 2, which is published as supporting information on the PNAS web site. Discussion In this paper, we have presented psychophysical evidence to support the view that pitch has two distinct dimensions: pitch chroma and pitch height. When chroma is kept constant, pitch height can be manipulated and ordered from low to high along a continuous dimension. We have shown that the dimensions of pitch chroma and height have distinct representations in human auditory cortex. These representations occur at a logical point in the recently proposed hierarchy of melody processing (8, 9). Medial HG (PAC) is activated similarly when processing noise or pitch. Lateral HG (secondary auditory cortex) shows an increase in activity when processing pitch (Figs. 3 and 4), and it is activated by changing both pitch chroma and pitch height. Areas specifically activated by chroma change exist anterior to HG within PP, whereas areas specifically activated by height change exist posterior to HG within PT. The analysis of pitch variation in sound sequences extending over seconds is required to process melodies in music and prosody in speech, and it has previously been shown to involve bilateral auditory areas anterior and posterior to HG (8, 9). However, these previous studies did not manipulate the two dimensions of pitch separately and did not address the possibility that distinct brain mechanisms exist for the processing of pitch chroma and height. The pitch height changes in the present experiment were perceived by subjects as a disrupted auditory stream; here, pitch height provided a nonspatial cue for the segregation of sound objects at an early stage in auditory source analysis (22). Previous work showing activation in PT where spatial cues were the basis for segregation can be interpreted in a similar way (23). A specific mechanism for pitch height processing in PT is therefore in accord with a recently proposed model in which this area plays a critical early role in the segregation of sound objects in the acoustic environment (13). In contrast, specific activation of PP anterior to HG by stimuli with changing chroma supports previous work on melody (7 9) and speech (10, 11) processing; activation of anterior auditory areas would afford a mechanism for tracking pitch chroma patterns forming coherent information streams that can be analyzed independently of the specific sound source. The anatomical and functional organization of the cortical auditory system in both humans and nonhuman primates is controversial (24, 25). In non-human primates, anatomical and electrophysiological evidence (26 28) suggests distinct what and where streams of processing beyond PAC, passing anteriorly and posteriorly, respectively. However, a proportion of neurons in the macaque posterior temporal lobe demonstrate responses to particular call sounds (28). In humans, functional imaging (29 31) and lesion (32) data support a dual organization beyond PAC; however, both the extent and the functional basis of any separation of processing mechanisms have been disputed (23 25). The present human study has demonstrated that mechanisms for analyzing pitch chroma patterns exist in the anterior temporal lobe, whereas mechanisms for analyzing pitch height exist in the posterior temporal lobe. If, as we hypothesize, pitch height differences are involved in the initial stages of auditory scene analysis, then it would seem reasonable to propose that in the human auditory brain, anterior cortical areas are engaged in processing patterns of information from one sound source, whereas posterior cortical areas are engaged in the segregation of sound sources in the environment. This work was supported by the Wellcome Trust (J.D.W. and T.D.G.) and by United Kingdom Medical Research Council Grant G (to S.U. and R.D.P.). 1. American Standards Association (1960) Acoustical Terminology S1 (American Standards Assoc., New York). 2. Donkin, F. (1874) Acoustics (Oxford Univ. Press, Oxford, U.K.). 3. Pikler, A. (1966) J. Acoust. Soc. Am. 39, Shepard, R. N. (1982) in The Psychology of Music, ed. Deutsch, D. (Academic, New York), 1st Ed., pp Krumhansl, C. L. (1990) in Cognitive Foundations of Musical Pitch (Oxford Univ. Press, New York), pp Rademacher, J., Morosan, P., Schormann, T., Schleicher, A., Werner, C., Freund, H. J. & Zilles, K. (2001) NeuroImage 13, Zatorre, R. J., Evans, A. C. & Meyer, E. (1994) J. Neurosci. 14, Griffiths, T. D., Büchel, C., Frackowiak, R. S. J. & Patterson, R. D. (1998) Nat. Neurosci. 1, Patterson, R. D., Uppenkamp, S., Johnsrude, I. & Griffiths, T. D. (2001) Neuron 36, Binder, J. R. (2000) Brain 123, Scott, S. K., Blank, C. C. & Wise, R. J. S. (2000) Brain 123, Westbury, C. F., Zatorre, R. J. & Evans, A. C. (1999) Cereb. Cortex 9, Griffiths, T. D. & Warren, J. D. (2002) Trends Neurosci. 25, Patterson, R. D. (1990) Mus. Percept. 8, Patterson, R. D., Milroy, R. & Allerhand, M. (1993) Contemp. Mus. Rev. 9, Irino, T. & Patterson, R. D. (2002) Speech Commun. 36, Wichmann, F. A. & Hill, N. J. (2001) Percept. Psychophys. 63, Wichmann, F. A. & Hill, N. J. (2001) Percept. Psychophys. 63, Hall, D. A., Haggard M. P., Akeroyd, M. A., Palmer, A. R., Summerfield, A. Q., Elliott, M. R., Gurney, E. M. & Bowtell, R. W. (1999) Hum. Brain Mapp. 3, Friston, K. J., Ashburner, J., Frith, C. D., Poline, J. B., Heather, J. D. & Frackowiak, R. S. J. (1995) Hum. Brain Mapp. 2, Evans, A. C., Collins, D. L., Mills, S. R., Brown, R. D., Kelly, R. L. & Peters, T. M. (1993) IEEE Nucl. Sci. Symp. Med. Imag. Conf. Proc. 3, Bregman, A. S. (1990) Auditory Scene Analysis (MIT Press, Cambridge, MA). 23. Zatorre, R. J., Bouffard, M., Ahad, P. & Belin, P. (2002) Nat. Neurosci. 5, Belin, P. & Zatorre, R. J. (2000) Nat. Neurosci. 3, Romanski, L. M., Tian, B., Fritz, J. B., Mishkin, M., Goldman-Rakic, P. S. & Rauschecker, J. P. (2000) Nat. Neurosci. 3, Kaas, J. H. & Hackett, T. A. (2000) Proc. Natl. Acad. Sci. USA 97, Rauschecker, J. P. & Tian, B. (2000) Proc. Natl. Acad. Sci. USA 97, Tian, B., Reser, D., Durham, A., Kustov, A. & Rauschecker, J. P. (2001) Science 292, Alain, C., Arnott, S. R., Hevenor, S., Graham, S. & Grady, C. L. (2001) Proc. Natl. Acad. Sci. USA 98, Maeder, P. P., Meuli, R. A., Adriani, M., Bellmann, A., Fornari, E., Thiran, J. P., Pittet, A. & Clarke, S. (2001) NeuroImage 14, Warren, J. D., Zielinski, B. A., Green, G. G. R., Rauschecker, J. P. & Griffiths, T. D. (2002) Neuron 34, Clarke, S., Bellmann, A., Meuli, R. A., Assal, G. & Steck, A. J. (2000) Neuropsychologia 38, cgi doi pnas Warren et al.

304 NeuroImage 24 (2005) Analysis of the spectral envelope of sounds by the human brain J.D. Warren, a,b,c A.R. Jennings, a and T.D. Griffiths a,b, * a Auditory Group, Medical School, University of Newcastle, Newcastle-upon-Tyne, NE2 4HH, UK b Wellcome Department of Imaging Neuroscience, Institute of Neurology, University College London, Queen Square, London, WC1N3BG, UK c Dementia Research Centre, Institute of Neurology, University College London, Queen Square, London WC1N 3BG, UK Received 20 April 2004; revised 25 October 2004; accepted 28 October 2004 Spectral envelope is the shape of the power spectrum of sound. It is an important cue for the identification of sound sources such as voices or instruments, and particular classes of sounds such as vowels. In everyday life, sounds with similar spectral envelopes are perceived as similar: we recognize a voice or a vowel regardless of pitch and intensity variations, and we recognize the same vowel regardless of whether it is voiced (a spectral envelope applied to a harmonic series) or whispered (a spectral envelope applied to noise). In this functional magnetic resonance imaging (fmri) experiment, we investigated the basis for analysis of spectral envelope by the human brain. Changing either the pitch or the spectral envelope of harmonic sounds produced similar activation within a bilateral network including Heschl s gyrus and adjacent cortical areas in the superior temporal lobe. Changing the spectral envelope of continuously alternating noise and harmonic sounds produced additional right-lateralized activation in superior temporal sulcus (STS). Our findings show that spectral shape is abstracted in superior temporal sulcus, suggesting that this region may have a generic role in the spectral analysis of sounds. These distinct levels of spectral analysis may represent early computational stages in a putative anteriorly directed stream for the categorization of sound. D 2004 Published by Elsevier Inc. Keywords: Sound; Spectral envelope; fmri Introduction Spectral envelope defines the shape of the power spectrum of a complex sound. It is one means by which sound sources such as voices or sound classes such as vowels can be characterized. The spectral envelope of a sound is related to a dimension of timbre, the spectral centroid (Grey, 1977; Krumhansl and Iverson, 1992; McAdams and Cunible, 1992). Timbre is defined operationally as the property that distinguishes two sounds of identical pitch, * Corresponding author. Auditory Group, Newcastle University Medical School, Framlington Place, Newcastle-upon-Tyne, NE2 4HH, UK. Fax: address: t.d.griffiths@ncl.ac.uk (T.D. Griffiths). Available online on ScienceDirect ( duration, and intensity (American Standards Association, 1960). The spectral envelope is not the sole determinant of timbre, a property with other dimensions including temporal envelope (the attack and decay of a sound) and a third dimension that is not consistent between studies (Grey, 1977; Krumhansl and Iverson, 1992; McAdams and Cunible, 1992). Everyday experience suggests that we possess mechanisms for the abstraction of spectral envelope from the detailed spectrotemporal structure of the sound with which it is associated. We identify individual voices despite variations in pitch and intensity, and we perceive the same vowel sound whether the vowel is voiced or whispered. In the latter case, the same spectral envelope is applied to a harmonic series or to noise. Fig. 1 demonstrates the common spectral envelope of voiced and whispered vowels. In this functional magnetic resonance imaging (fmri) experiment, we investigated the analysis of spectral envelope by the human brain. We used stimuli in which dgenerict spectral envelopes were applied to harmonic series or to noise. These stimuli are likely to be processed by brain mechanisms that analyze the spectral envelopes of natural sounds such as vowels (Fig. 1), while avoiding the semantic associations of real vowels or musical instrumental timbres. The experimental design (Fig. 2) employed sequences of sounds composed either entirely of harmonic sounds or from alternating harmonic sounds and noise. In stimulus sequences consisting entirely of harmonic sounds, spectral envelope or pitch was manipulated (Fig. 2: all-harmonic conditions). This manipulation allowed us to examine a level of analysis that could be based on the detailed spectrotemporal structure of sound. Changing the spectral envelope alters the detailed spectral structure of the sound while changing pitch (in stimuli such as these with unresolved harmonics) is associated with changes in the repetition rate and temporal structure of the sound. In both cases, there are changes to detailed spectrotemporal structure that are likely to be processed by early auditory areas in the superior temporal lobe. Based on previous studies of pitch change alone (Patterson et al., 2002; Warren et al., 2003), we predicted that the analysis of detailed spectrotemporal structure would engage a cortical network including nonprimary auditory cortex in Heschl s gyrus (HG) and planum temporale (PT) /$ - see front matter D 2004 Published by Elsevier Inc. doi: /j.neuroimage

305 J.D. Warren et al. / NeuroImage 24 (2005) Fig. 1. Schematic representations of complex sounds in the frequency domain. Schematic frequency-domain representations of a vowel (/a/) and a generic sound used in the fmri experiment. For both the vowel and the generic sound, the same spectral shape can be applied to two different types of spectral fine structure: a harmonic series (here f0 = 100 Hz) and noise. In the case of the vowel, this manipulation is perceived as the same phoneme voiced (harmonic) or whispered (noise). In the case of the generic sound, the harmonic and noise versions of the sound are perceived as similar despite the change in spectrotemporal fine structure; this perceptual similarity is based on a derived feature (spectral shape), since the stimulus does not closely resemble any familiar natural sound. In stimulus sequences consisting of alternating noise and harmonic sounds, spectral envelope was manipulated while the detailed spectrotemporal structure constantly varied (Fig. 2: alternating conditions). This manipulation allowed us to examine a level of analysis in which spectral envelope is abstracted independently of changes in the detailed spectrotemporal structure. We predicted that this level of spectral analysis would engage additional brain areas to those involved in the analysis of detailed spectrotemporal structure: these additional areas are required for the abstraction of spectral shape independently of changes in fine structure. Based on evidence for the involvement of the superior temporal sulcus (STS) in the identification of a variety of natural sounds (Adams and Janata, 2002; Beauchamp et al., 2004; Belin et al., 2000; Binder et al., 2004; Engelien et al., 1995; Lewis et al., 2004; Maeder et al., 2001; Menon et al., 2002; Nakamura et al., 2001; Zatorre et al., 2004), we predicted the specific involvement of STS in this more abstract level of spectral analysis. We hypothesized that the abstraction of the spectral shapes of sounds by STS is a general mechanism of auditory cognition, in addition to any more specific role of STS in the processing of particular sound categories. Materials and methods Stimuli were synthesized digitally in the frequency domain from harmonic series or fixed-amplitude, random-phase noise with equivalent passband and intensity (sampling rate 44.1 khz and 16 bit resolution). Harmonic sounds were in positive Schroeder phase (Schroeder and Strube, 1986) to reduce peak factor. Spectral envelope was specified in the frequency domain for both noise and harmonic stimuli (Fig. 2). The duration of each sound was 500 ms (with 20 ms gating windows). Sounds were combined into sequences comprising either harmonic sounds only (dall-harmonict sequences) or alternating harmonic sounds and noise (dalternatingt sequences; see Fig. 2). In the all-harmonic sequences, fundamental frequency (f0) either remained constant or was varied (120, 144, 168, or 192 Hz) between successive elements of the sequence. In both the all-harmonic and alternating sequences, the spectral envelope of the individual sounds either remained constant or was varied (one of four spectral shapes; see Fig. 2) between successive elements in the sequence. Changes in f0 are perceived as changes in pitch, while changes in spectral envelope are perceived as changes in the identity of the sound source. Using these manipulations, we created three types of all-harmonic sequence (both f0 and spectral envelope constant; f0 varying and spectral envelope constant; f0 constant and spectral shape varying) and two types of alternating sequence (spectral envelope constant and spectral envelope varying). Each sequence contained either 15 or 16 elements; the total duration of a sequence was therefore 7.5 or 8 s. Examples of stimuli are available as Supplementary Material online. The ability of listeners to perceive the spectral envelope changes used in the scanning sequences was assessed in a separate psychophysical experiment. The same 15- or 16-element sequences presented during scanning were used in two-interval-two-alternative forced choice testing where subjects were required to detect the sequences containing change. Subjects were able to detect harmonic sequences containing pitch change or spectral shape change with 100% accuracy. Subjects were also able to detect alternating noise-harmonic sequences containing spectral shape change with 100% accuracy. The changes in pitch and spectral shape used during scanning were therefore highly salient, and spectral shape changes could be perceived independently of constantly changing fine spectrotemporal structure. Fourteen subjects (six males, eight females; 13 right-handed, one left-handed) aged participated in the fmri experiment. No subject had any history of hearing or neurological disorder and all subjects had normal structural MRI scans. All subjects gave informed consent and the experiment was carried out with approval of the local ethics committee. During scanning, stimuli were presented diotically at a fixed sound pressure level of 80 db in the silent phase of a sparse image acquisition protocol (Hall et al., 1999). A custom sound delivery system based on Kossk electrostatic headphones was used. In the sparse imaging protocol, a whole brain volume was acquired every 12.5 s, each brain acquisition requiring 4.32 s. Blood oxygenation level dependent (BOLD) contrast images were acquired at 1.5 T (Siemens Sonata, Erlangen) using gradient echo planar imaging (echo time = 50 ms). Each volume comprised 48 2 mm slices with in-plane resolution of 3 3mm 2 covering the entire brain. Six stimulus conditions were presented in randomized order (Fig. 2): (1) all-harmonic sequences with constant f0 and constant spectral envelope; (2) allharmonic sequences with changing f0 and constant spectral envelope; (3) all-harmonic sequences with constant f0 and changing spectral envelope; (4) alternating harmonic-noise sequences with constant f0 and constant spectral envelope; (5) alternating harmonic-noise sequences with constant f0 and changing spectral envelope; (6) silence. Subjects were instructed to attend to the sound sequences with their eyes closed; there was no active auditory discrimination task, however, to help maintain attention,

306 1054 J.D. Warren et al. / NeuroImage 24 (2005) Fig. 2. Schematic representations of the stimuli. Examples of sound sequences derived from the stimuli used in the fmri experiment. Individual elements of each sequence are represented in the frequency domain (see Fig. 1). Sequences are composed entirely of harmonic sounds (all-harmonic sequences) or harmonic sounds alternating with noise (alternating sequences). Key: fcsc, f0 constant, spectral shape constant; fvsc, f0 varying, spectral shape constant; fcsv, f0 constant, spectral shape varying. subjects were asked to signal the end of each sequence by pressing a button box positioned beneath the right hand. Brain volumes (192) were acquired for each subject (32 volumes for each condition). Imaging data were analyzed for the group using statistical parametric mapping implemented in SPM99 software (http//: Scans were realigned and spatially normalized (Friston et al., 1995a) to MNI standard stereotactic space (Evans et al., 1993) and spatially smoothed with an isotropic Gaussian kernel of 8 mm full-width-at-half-maximum. Statistical parametric maps were generated by modeling the evoked hemodynamic response for the different stimuli as boxcars convolved with a synthetic hemodynamic response function in the context of the general linear model (Friston et al., 1995b). Population-level inferences concerning BOLD signal changes between conditions of interest were based on a random effects model that estimated the second level t statistic at each voxel. Hemispheric laterality effects were assessed using a second level paired t test comparing each contrast image with its counterpart flipped about the anteroposterior axis. Local maxima were assessed using a voxel significance threshold of P b 0.05, after small volume correction taking the prior anatomical hypotheses into account. Anatomical small volumes comprised right and left HG, superior temporal gyrus (STG) and STS (based on the group mean normalized structural MRI brain volume), and 95% probability maps for left and right human PT (Westbury et al., 1999). Results This experiment was designed to examine two levels of auditory analysis using contrasts based on two types of stimuli. In the all-harmonic contrasts, the baseline condition has constant spectrotemporal fine structure, whereas in the contrast between the alternating conditions, the baseline condition has constantly changing spectrotemporal fine structure (harmonic series or noise). Spectral envelope or pitch changes in the all-harmonic sequences can be assessed from changes in spectrotemporal fine structure, whereas spectral envelope changes in the alternating sequences must be analyzed independently of the changing spectrotemporal fine structure and demand additional computational resources. Statistical parametric maps of brain activation for the contrasts of interest are shown in Fig. 3. The contrast between all-harmonic conditions with changing and fixed f0 (Fig. 3a), and the contrast between all-harmonic conditions with changing and fixed spectral envelope (Fig. 3b), both produced bilateral activation in primary auditory cortex (PAC) in medial HG (Rademacher et al., 2001) and in nonprimary auditory cortex in lateral HG and anterolateral PT (Westbury et al., 1999). No temporal lobe regions were activated only in the pitch change contrast or only in the spectral envelope contrast between the all-harmonic conditions. The superior temporal lobe regions engaged in both pitch analysis and spectral envelope analysis in the all-harmonic conditions together constitute a substrate for the analysis of spectrotemporal fine structure.

307 J.D. Warren et al. / NeuroImage 24 (2005) Fig. 3. Brain areas involved in the spectral analysis of sounds. Group statistical parametric maps, SPMs (thresholded at an uncorrected voxel significance criterion of P b for display purposes), have been rendered on axial sections of the group mean normalized structural MRI volume. Each section has been tilted 0.5 radians to lie parallel to the superior surface of the temporal lobes; in each panel, the upper section lies along the superior temporal plane (STP), while the lower section runs along the dorsal bank of the superior temporal sulcus (STS). (a and b) Analysis of f0 change (changing f0 minus constant f0, Df0 all-harm) (a) and analysis of spectral envelope change (changing spectral shape minus constant spectral shape, Dshape all-harm) in the all-harmonic conditions (b) each activates a similar bilateral brain network including Heschl s gyrus (HG) and anterolateral planum temporale (PT). These areas represent a brain substrate for the analysis of spectrotemporal fine structure. (c) Analysis of spectral envelope change (changing spectral shape minus constant spectral shape, Dshape alternate) in the alternating harmonic-noise conditions activates a more extensive bilateral network extending from anterolateral PT through STG and along STS. (d) The SPM in c has been exclusively masked by the SPM in a to identify areas involved in the analysis of spectral envelope changes but not in the analysis of spectrotemporal fine structure (Dshape only): these areas represent a specific brain substrate for the abstraction of spectral shape, a key step in sound identification. The contrast between alternating conditions with changing and fixed spectral envelope produced bilateral activation including medial and lateral HG, anterolateral PT and posterior STG, and extending anteriorly along STS (Fig. 3c): this activation overlapped with the network identified in both the all-harmonic contrasts (Figs. 3a and b), but was more extensive. The contrast between the alternating conditions (Fig. 3c) was exclusively masked by the contrast between changing and fixed f0 (Fig. 3a) in order to identify brain areas engaged by spectral envelope changes but not by pitch changes. Areas identified after exclusive masking extended from anterolateral PT through STG and anteriorly along STS (Fig. 3d). These temporal lobe areas lie beyond the network engaged in pitch analysis and constitute a substrate for the analysis of spectral envelope independently of spectrotemporal fine structure. The magnitude and extent of activation were greater in the right than the left hemisphere and maximal in the midportion of right STS (Table 1); however, this asymmetry was not statistically significant ( P b 0.05). Local maxima for the group in each of the contrasts of interest at a statistical threshold of P b 0.05 (after small volume correction based on the prior anatomical hypotheses) are presented in Table 1. Discussion This experiment has demonstrated distinct levels of analysis of spectral envelope that map onto distinct cortical regions in the human auditory brain. Changing the spectral envelope or the pitch of a harmonic sound engages a brain network that includes nonprimary auditory cortex in lateral HG and anterolateral PT. In these harmonic stimuli, both pitch and spectral envelope changes might be analyzed on the basis of the detailed spectrotemporal structure of the stimulus, and it is likely that this network in the superior temporal lobe contains mechanisms for the analysis of spectrotemporal fine structure. While these mechanisms have a similar neuroanatomical substrate, they are not necessarily identical. Changes in pitch can, in general, be represented as changes in harmonic spacing in the frequency domain or repetition rate in the time domain. In this experiment, harmonics were all unresolved and the repetition rate (fine temporal structure) is most relevant to the analysis of pitch change. Changes in spectral envelope produce changes in the detailed auditory spectrum. While analysis of the spectral envelope of the harmonic stimuli could be achieved by analysis of the detailed spectrotemporal structure of the stimuli, this experiment provides evidence for a further level of spectral analysis when this is not possible. In the alternating conditions, the detailed spectrotemporal structure of the stimuli is constantly changing, and the computation of spectral envelope requires the abstraction of spectral shape. This more abstract level of spectral analysis is instantiated in temporal lobe areas beyond those engaged in the analysis of detailed spectrotemporal structure. The additional temporal lobe areas comprise a rightward-lateralized network extending from the superior tempo-

Structure and Function of the Auditory and Vestibular Systems (Fall 2014) Auditory Cortex (1) Prof. Xiaoqin Wang

Structure and Function of the Auditory and Vestibular Systems (Fall 2014) Auditory Cortex (1) Prof. Xiaoqin Wang 580.626 Structure and Function of the Auditory and Vestibular Systems (Fall 2014) Auditory Cortex (1) Prof. Xiaoqin Wang Laboratory of Auditory Neurophysiology Department of Biomedical Engineering Johns

More information

P. Hitchcock, Ph.D. Department of Cell and Developmental Biology Kellogg Eye Center. Wednesday, 16 March 2009, 1:00p.m. 2:00p.m.

P. Hitchcock, Ph.D. Department of Cell and Developmental Biology Kellogg Eye Center. Wednesday, 16 March 2009, 1:00p.m. 2:00p.m. Normal CNS, Special Senses, Head and Neck TOPIC: CEREBRAL HEMISPHERES FACULTY: LECTURE: READING: P. Hitchcock, Ph.D. Department of Cell and Developmental Biology Kellogg Eye Center Wednesday, 16 March

More information

Rhythm and Rate: Perception and Physiology HST November Jennifer Melcher

Rhythm and Rate: Perception and Physiology HST November Jennifer Melcher Rhythm and Rate: Perception and Physiology HST 722 - November 27 Jennifer Melcher Forward suppression of unit activity in auditory cortex Brosch and Schreiner (1997) J Neurophysiol 77: 923-943. Forward

More information

Cerebral Cortex 1. Sarah Heilbronner

Cerebral Cortex 1. Sarah Heilbronner Cerebral Cortex 1 Sarah Heilbronner heilb028@umn.edu Want to meet? Coffee hour 10-11am Tuesday 11/27 Surdyk s Overview and organization of the cerebral cortex What is the cerebral cortex? Where is each

More information

Auditory fmri correlates of loudness perception for monaural and diotic stimulation

Auditory fmri correlates of loudness perception for monaural and diotic stimulation PROCEEDINGS of the 22 nd International Congress on Acoustics Psychological and Physiological Acoustics (others): Paper ICA2016-435 Auditory fmri correlates of loudness perception for monaural and diotic

More information

Chapter 11: Sound, The Auditory System, and Pitch Perception

Chapter 11: Sound, The Auditory System, and Pitch Perception Chapter 11: Sound, The Auditory System, and Pitch Perception Overview of Questions What is it that makes sounds high pitched or low pitched? How do sound vibrations inside the ear lead to the perception

More information

LEAH KRUBITZER RESEARCH GROUP LAB PUBLICATIONS WHAT WE DO LINKS CONTACTS

LEAH KRUBITZER RESEARCH GROUP LAB PUBLICATIONS WHAT WE DO LINKS CONTACTS LEAH KRUBITZER RESEARCH GROUP LAB PUBLICATIONS WHAT WE DO LINKS CONTACTS WHAT WE DO Present studies and future directions Our laboratory is currently involved in two major areas of research. The first

More information

PHY3111 Mid-Semester Test Study. Lecture 2: The hierarchical organisation of vision

PHY3111 Mid-Semester Test Study. Lecture 2: The hierarchical organisation of vision PHY3111 Mid-Semester Test Study Lecture 2: The hierarchical organisation of vision 1. Explain what a hierarchically organised neural system is, in terms of physiological response properties of its neurones.

More information

The Central Nervous System

The Central Nervous System The Central Nervous System Cellular Basis. Neural Communication. Major Structures. Principles & Methods. Principles of Neural Organization Big Question #1: Representation. How is the external world coded

More information

CISC 3250 Systems Neuroscience

CISC 3250 Systems Neuroscience CISC 3250 Systems Neuroscience Levels of organization Central Nervous System 1m 10 11 neurons Neural systems and neuroanatomy Systems 10cm Networks 1mm Neurons 100μm 10 8 neurons Professor Daniel Leeds

More information

LISC-322 Neuroscience Cortical Organization

LISC-322 Neuroscience Cortical Organization LISC-322 Neuroscience Cortical Organization THE VISUAL SYSTEM Higher Visual Processing Martin Paré Assistant Professor Physiology & Psychology Most of the cortex that covers the cerebral hemispheres is

More information

Auditory and Vestibular Systems

Auditory and Vestibular Systems Auditory and Vestibular Systems Objective To learn the functional organization of the auditory and vestibular systems To understand how one can use changes in auditory function following injury to localize

More information

Ch 5. Perception and Encoding

Ch 5. Perception and Encoding Ch 5. Perception and Encoding Cognitive Neuroscience: The Biology of the Mind, 2 nd Ed., M. S. Gazzaniga, R. B. Ivry, and G. R. Mangun, Norton, 2002. Summarized by Y.-J. Park, M.-H. Kim, and B.-T. Zhang

More information

Cognitive Neuroscience Attention

Cognitive Neuroscience Attention Cognitive Neuroscience Attention There are many aspects to attention. It can be controlled. It can be focused on a particular sensory modality or item. It can be divided. It can set a perceptual system.

More information

Structure and Function of the Auditory and Vestibular Systems (Fall 2014) Auditory Cortex (3) Prof. Xiaoqin Wang

Structure and Function of the Auditory and Vestibular Systems (Fall 2014) Auditory Cortex (3) Prof. Xiaoqin Wang 580.626 Structure and Function of the Auditory and Vestibular Systems (Fall 2014) Auditory Cortex (3) Prof. Xiaoqin Wang Laboratory of Auditory Neurophysiology Department of Biomedical Engineering Johns

More information

J Jeffress model, 3, 66ff

J Jeffress model, 3, 66ff Index A Absolute pitch, 102 Afferent projections, inferior colliculus, 131 132 Amplitude modulation, coincidence detector, 152ff inferior colliculus, 152ff inhibition models, 156ff models, 152ff Anatomy,

More information

Cortical Control of Movement

Cortical Control of Movement Strick Lecture 2 March 24, 2006 Page 1 Cortical Control of Movement Four parts of this lecture: I) Anatomical Framework, II) Physiological Framework, III) Primary Motor Cortex Function and IV) Premotor

More information

Neurobiology of Hearing (Salamanca, 2012) Auditory Cortex (2) Prof. Xiaoqin Wang

Neurobiology of Hearing (Salamanca, 2012) Auditory Cortex (2) Prof. Xiaoqin Wang Neurobiology of Hearing (Salamanca, 2012) Auditory Cortex (2) Prof. Xiaoqin Wang Laboratory of Auditory Neurophysiology Department of Biomedical Engineering Johns Hopkins University web1.johnshopkins.edu/xwang

More information

Ch 5. Perception and Encoding

Ch 5. Perception and Encoding Ch 5. Perception and Encoding Cognitive Neuroscience: The Biology of the Mind, 2 nd Ed., M. S. Gazzaniga,, R. B. Ivry,, and G. R. Mangun,, Norton, 2002. Summarized by Y.-J. Park, M.-H. Kim, and B.-T. Zhang

More information

Topographical functional connectivity patterns exist in the congenitally, prelingually deaf

Topographical functional connectivity patterns exist in the congenitally, prelingually deaf Supplementary Material Topographical functional connectivity patterns exist in the congenitally, prelingually deaf Ella Striem-Amit 1*, Jorge Almeida 2,3, Mario Belledonne 1, Quanjing Chen 4, Yuxing Fang

More information

Auditory System & Hearing

Auditory System & Hearing Auditory System & Hearing Chapters 9 and 10 Lecture 17 Jonathan Pillow Sensation & Perception (PSY 345 / NEU 325) Spring 2015 1 Cochlea: physical device tuned to frequency! place code: tuning of different

More information

Spectro-temporal response fields in the inferior colliculus of awake monkey

Spectro-temporal response fields in the inferior colliculus of awake monkey 3.6.QH Spectro-temporal response fields in the inferior colliculus of awake monkey Versnel, Huib; Zwiers, Marcel; Van Opstal, John Department of Biophysics University of Nijmegen Geert Grooteplein 655

More information

FINE-TUNING THE AUDITORY SUBCORTEX Measuring processing dynamics along the auditory hierarchy. Christopher Slugocki (Widex ORCA) WAS 5.3.

FINE-TUNING THE AUDITORY SUBCORTEX Measuring processing dynamics along the auditory hierarchy. Christopher Slugocki (Widex ORCA) WAS 5.3. FINE-TUNING THE AUDITORY SUBCORTEX Measuring processing dynamics along the auditory hierarchy. Christopher Slugocki (Widex ORCA) WAS 5.3.2017 AUDITORY DISCRIMINATION AUDITORY DISCRIMINATION /pi//k/ /pi//t/

More information

Regional and Lobe Parcellation Rhesus Monkey Brain Atlas. Manual Tracing for Parcellation Template

Regional and Lobe Parcellation Rhesus Monkey Brain Atlas. Manual Tracing for Parcellation Template Regional and Lobe Parcellation Rhesus Monkey Brain Atlas Manual Tracing for Parcellation Template Overview of Tracing Guidelines A) Traces are performed in a systematic order they, allowing the more easily

More information

CEREBRUM Dr. Jamila Elmedany Dr. Essam Eldin Salama

CEREBRUM Dr. Jamila Elmedany Dr. Essam Eldin Salama CEREBRUM Dr. Jamila Elmedany Dr. Essam Eldin Salama Objectives At the end of the lecture, the student should be able to: List the parts of the cerebral hemisphere (cortex, medulla, basal nuclei, lateral

More information

Nature Neuroscience doi: /nn Supplementary Figure 1. Characterization of viral injections.

Nature Neuroscience doi: /nn Supplementary Figure 1. Characterization of viral injections. Supplementary Figure 1 Characterization of viral injections. (a) Dorsal view of a mouse brain (dashed white outline) after receiving a large, unilateral thalamic injection (~100 nl); demonstrating that

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Peter Hitchcock, PH.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Non-commercial Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/

More information

Neural Correlates of Human Cognitive Function:

Neural Correlates of Human Cognitive Function: Neural Correlates of Human Cognitive Function: A Comparison of Electrophysiological and Other Neuroimaging Approaches Leun J. Otten Institute of Cognitive Neuroscience & Department of Psychology University

More information

Report. Direct Recordings of Pitch Responses from Human Auditory Cortex

Report. Direct Recordings of Pitch Responses from Human Auditory Cortex Current Biology 0,, June, 00 ª00 Elsevier Ltd. Open access under CC BY license. DOI 0.0/j.cub.00.0.0 Direct Recordings of Pitch Responses from Human Auditory Cortex Report Timothy D. Griffiths,, * Sukhbinder

More information

Lateral Geniculate Nucleus (LGN)

Lateral Geniculate Nucleus (LGN) Lateral Geniculate Nucleus (LGN) What happens beyond the retina? What happens in Lateral Geniculate Nucleus (LGN)- 90% flow Visual cortex Information Flow Superior colliculus 10% flow Slide 2 Information

More information

Chapter 5. Summary and Conclusions! 131

Chapter 5. Summary and Conclusions! 131 ! Chapter 5 Summary and Conclusions! 131 Chapter 5!!!! Summary of the main findings The present thesis investigated the sensory representation of natural sounds in the human auditory cortex. Specifically,

More information

Motor Systems I Cortex. Reading: BCP Chapter 14

Motor Systems I Cortex. Reading: BCP Chapter 14 Motor Systems I Cortex Reading: BCP Chapter 14 Principles of Sensorimotor Function Hierarchical Organization association cortex at the highest level, muscles at the lowest signals flow between levels over

More information

Outline of the next three lectures

Outline of the next three lectures Outline of the next three lectures Lecture 35 Anatomy of the human cerebral cortex gross and microscopic cell types connections Vascular supply of the cerebral cortex Disorders involving the cerebral cortex

More information

25/09/2012. Capgras Syndrome. Chapter 2. Capgras Syndrome - 2. The Neural Basis of Cognition

25/09/2012. Capgras Syndrome. Chapter 2. Capgras Syndrome - 2. The Neural Basis of Cognition Chapter 2 The Neural Basis of Cognition Capgras Syndrome Alzheimer s patients & others delusion that significant others are robots or impersonators - paranoia Two brain systems for facial recognition -

More information

Functional properties of human auditory cortical fields

Functional properties of human auditory cortical fields SYSTEMS NEUROSCIENCE Original Research Article published: 03 December 2010 doi: 10.3389/fnsys.2010.00155 Functional properties of human auditory cortical fields David L. Woods 1,2,3,4 *, Timothy J. Herron

More information

Computational Explorations in Cognitive Neuroscience Chapter 7: Large-Scale Brain Area Functional Organization

Computational Explorations in Cognitive Neuroscience Chapter 7: Large-Scale Brain Area Functional Organization Computational Explorations in Cognitive Neuroscience Chapter 7: Large-Scale Brain Area Functional Organization 1 7.1 Overview This chapter aims to provide a framework for modeling cognitive phenomena based

More information

CNS pathways. topics. The auditory nerve, and the cochlear nuclei of the hindbrain

CNS pathways. topics. The auditory nerve, and the cochlear nuclei of the hindbrain CNS pathways topics The auditory nerve, and the cochlear nuclei of the hindbrain Sensory channels of information flow in CNS Pathways to medial geniculate body of thalamus Functional categorization of

More information

MULTI-CHANNEL COMMUNICATION

MULTI-CHANNEL COMMUNICATION INTRODUCTION Research on the Deaf Brain is beginning to provide a new evidence base for policy and practice in relation to intervention with deaf children. This talk outlines the multi-channel nature of

More information

Serial and Parallel Processing in the Human Auditory Cortex: A Magnetoencephalographic Study

Serial and Parallel Processing in the Human Auditory Cortex: A Magnetoencephalographic Study Cerebral Cortex January 2006;16:18--30 doi:10.1093/cercor/bhi080 Advance Access publication March 30, 2005 Serial and Parallel Processing in the Human Auditory Cortex: A Magnetoencephalographic Study Koji

More information

Human Paleoneurology and the Evolution of the Parietal Cortex

Human Paleoneurology and the Evolution of the Parietal Cortex PARIETAL LOBE The Parietal Lobes develop at about the age of 5 years. They function to give the individual perspective and to help them understand space, touch, and volume. The location of the parietal

More information

Mechanosensation. Central Representation of Touch. Wilder Penfield. Somatotopic Organization

Mechanosensation. Central Representation of Touch. Wilder Penfield. Somatotopic Organization Mechanosensation Central Representation of Touch Touch and tactile exploration Vibration and pressure sensations; important for clinical testing Limb position sense John H. Martin, Ph.D. Center for Neurobiology

More information

Cortical Organization. Functionally, cortex is classically divided into 3 general types: 1. Primary cortex:. - receptive field:.

Cortical Organization. Functionally, cortex is classically divided into 3 general types: 1. Primary cortex:. - receptive field:. Cortical Organization Functionally, cortex is classically divided into 3 general types: 1. Primary cortex:. - receptive field:. 2. Secondary cortex: located immediately adjacent to primary cortical areas,

More information

Anatomical Substrates of Somatic Sensation

Anatomical Substrates of Somatic Sensation Anatomical Substrates of Somatic Sensation John H. Martin, Ph.D. Center for Neurobiology & Behavior Columbia University CPS The 2 principal somatic sensory systems: 1) Dorsal column-medial lemniscal system

More information

High resolution functional imaging of the auditory pathway and cortex

High resolution functional imaging of the auditory pathway and cortex High resolution functional imaging of the auditory pathway and cortex Elia Formisano Maastricht-Brain Imaging Center (M-BIC) Faculty of Psychology and Neuroscience Maastricht University, The Netherlands

More information

Leah Militello, class of 2018

Leah Militello, class of 2018 Leah Militello, class of 2018 Objectives 1. Describe the general organization of cerebral hemispheres. 2. Describe the locations and features of the different functional areas of cortex. 3. Understand

More information

Motor Functions of Cerebral Cortex

Motor Functions of Cerebral Cortex Motor Functions of Cerebral Cortex I: To list the functions of different cortical laminae II: To describe the four motor areas of the cerebral cortex. III: To discuss the functions and dysfunctions of

More information

CEREBRUM. Dr. Jamila EL Medany

CEREBRUM. Dr. Jamila EL Medany CEREBRUM Dr. Jamila EL Medany Objectives At the end of the lecture, the student should be able to: List the parts of the cerebral hemisphere (cortex, medulla, basal nuclei, lateral ventricle). Describe

More information

The neural code for interaural time difference in human auditory cortex

The neural code for interaural time difference in human auditory cortex The neural code for interaural time difference in human auditory cortex Nelli H. Salminen and Hannu Tiitinen Department of Biomedical Engineering and Computational Science, Helsinki University of Technology,

More information

Chapter 2 Processing Streams in Auditory Cortex

Chapter 2 Processing Streams in Auditory Cortex Chapter 2 Processing Streams in Auditory Cortex 1 2 Josef P. Rauschecker 3 Keywords anterior ectosylvian bandpass noise combination sensitivity frequency modulation functional MRI inferior frontal inferior

More information

Neuroscience Tutorial

Neuroscience Tutorial Neuroscience Tutorial Brain Organization : cortex, basal ganglia, limbic lobe : thalamus, hypothal., pituitary gland : medulla oblongata, midbrain, pons, cerebellum Cortical Organization Cortical Organization

More information

Neurophysiology of systems

Neurophysiology of systems Neurophysiology of systems Motor cortex (voluntary movements) Dana Cohen, Room 410, tel: 7138 danacoh@gmail.com Voluntary movements vs. reflexes Same stimulus yields a different movement depending on context

More information

Visual Context Dan O Shea Prof. Fei Fei Li, COS 598B

Visual Context Dan O Shea Prof. Fei Fei Li, COS 598B Visual Context Dan O Shea Prof. Fei Fei Li, COS 598B Cortical Analysis of Visual Context Moshe Bar, Elissa Aminoff. 2003. Neuron, Volume 38, Issue 2, Pages 347 358. Visual objects in context Moshe Bar.

More information

Plasticity of Cerebral Cortex in Development

Plasticity of Cerebral Cortex in Development Plasticity of Cerebral Cortex in Development Jessica R. Newton and Mriganka Sur Department of Brain & Cognitive Sciences Picower Center for Learning & Memory Massachusetts Institute of Technology Cambridge,

More information

Hearing II Perceptual Aspects

Hearing II Perceptual Aspects Hearing II Perceptual Aspects Overview of Topics Chapter 6 in Chaudhuri Intensity & Loudness Frequency & Pitch Auditory Space Perception 1 2 Intensity & Loudness Loudness is the subjective perceptual quality

More information

FRONTAL LOBE. Central Sulcus. Ascending ramus of the Cingulate Sulcus. Cingulate Sulcus. Lateral Sulcus

FRONTAL LOBE. Central Sulcus. Ascending ramus of the Cingulate Sulcus. Cingulate Sulcus. Lateral Sulcus FRONTAL LOBE Central Ascending ramus of the Cingulate Cingulate Lateral Lateral View Medial View Motor execution and higher cognitive functions (e.g., language production, impulse inhibition, reasoning

More information

Sensory coding and somatosensory system

Sensory coding and somatosensory system Sensory coding and somatosensory system Sensation and perception Perception is the internal construction of sensation. Perception depends on the individual experience. Three common steps in all senses

More information

NEURAL COMPUTATIONS UNDERLYING SPEECH RECOGNITION IN THE HUMAN AUDITORY SYSTEM

NEURAL COMPUTATIONS UNDERLYING SPEECH RECOGNITION IN THE HUMAN AUDITORY SYSTEM NEURAL COMPUTATIONS UNDERLYING SPEECH RECOGNITION IN THE HUMAN AUDITORY SYSTEM A Dissertation submitted to the Faculty of the Graduate School of Arts and Sciences of Georgetown University in partial fulfillment

More information

Exam 1 PSYC Fall 1998

Exam 1 PSYC Fall 1998 Exam 1 PSYC 2022 Fall 1998 (2 points) Briefly describe the difference between a dualistic and a materialistic explanation of brain-mind relationships. (1 point) True or False. George Berkely was a monist.

More information

Learning Sound Categories: A Neural Model and Supporting Experiments

Learning Sound Categories: A Neural Model and Supporting Experiments English version: Acoustical Science and Technology, 23(4), July 2002, pp. 213-221. Japanese version: Journal of the Acoustical Society of Japan, 58(7), July 2002, pp. 441-449. Learning Sound Categories:

More information

SUPPLEMENTARY MATERIAL. Table. Neuroimaging studies on the premonitory urge and sensory function in patients with Tourette syndrome.

SUPPLEMENTARY MATERIAL. Table. Neuroimaging studies on the premonitory urge and sensory function in patients with Tourette syndrome. SUPPLEMENTARY MATERIAL Table. Neuroimaging studies on the premonitory urge and sensory function in patients with Tourette syndrome. Authors Year Patients Male gender (%) Mean age (range) Adults/ Children

More information

Spectral and Temporal Processing in Human Auditory Cortex

Spectral and Temporal Processing in Human Auditory Cortex Spectral and Temporal Processing in Human Auditory Cortex Deborah A. Hall, Ingrid S. Johnsrude 1,2, Mark P. Haggard, Alan R. Palmer, Michael A. Akeroyd and A. Quentin Summerfield MRC Institute of Hearing

More information

Association Cortex, Asymmetries, and Cortical Localization of Affective and Cognitive Functions. Michael E. Goldberg, M.D.

Association Cortex, Asymmetries, and Cortical Localization of Affective and Cognitive Functions. Michael E. Goldberg, M.D. Association Cortex, Asymmetries, and Cortical Localization of Affective and Cognitive Functions Michael E. Goldberg, M.D. The origins of localization The concept that different parts of the brain did different

More information

Sensorimotor Functioning. Sensory and Motor Systems. Functional Anatomy of Brain- Behavioral Relationships

Sensorimotor Functioning. Sensory and Motor Systems. Functional Anatomy of Brain- Behavioral Relationships Sensorimotor Functioning Sensory and Motor Systems Understanding brain-behavior relationships requires knowledge of sensory and motor systems. Sensory System = Input Neural Processing Motor System = Output

More information

Lecture 35 Association Cortices and Hemispheric Asymmetries -- M. Goldberg

Lecture 35 Association Cortices and Hemispheric Asymmetries -- M. Goldberg Lecture 35 Association Cortices and Hemispheric Asymmetries -- M. Goldberg The concept that different parts of the brain did different things started with Spurzheim and Gall, whose phrenology became quite

More information

Early Stages of Vision Might Explain Data to Information Transformation

Early Stages of Vision Might Explain Data to Information Transformation Early Stages of Vision Might Explain Data to Information Transformation Baran Çürüklü Department of Computer Science and Engineering Mälardalen University Västerås S-721 23, Sweden Abstract. In this paper

More information

Retinotopy & Phase Mapping

Retinotopy & Phase Mapping Retinotopy & Phase Mapping Fani Deligianni B. A. Wandell, et al. Visual Field Maps in Human Cortex, Neuron, 56(2):366-383, 2007 Retinotopy Visual Cortex organised in visual field maps: Nearby neurons have

More information

Cortex and Mind Chapter 6

Cortex and Mind Chapter 6 Cortex and Mind Chapter 6 There are many aspects to attention. It can be controlled. It can be focused on a particular sensory modality or item. It can be divided. It can set a perceptual system. It has

More information

Wernicke s area is commonly considered to be the neural locus of speech. perception. It is named after the German neurologist Carl Wernicke, who first

Wernicke s area is commonly considered to be the neural locus of speech. perception. It is named after the German neurologist Carl Wernicke, who first Supplementary Material 1. Where is Wernicke s area? Wernicke s area is commonly considered to be the neural locus of speech perception. It is named after the German neurologist Carl Wernicke, who first

More information

OPTO 5320 VISION SCIENCE I

OPTO 5320 VISION SCIENCE I OPTO 5320 VISION SCIENCE I Monocular Sensory Processes of Vision: Color Vision Mechanisms of Color Processing . Neural Mechanisms of Color Processing A. Parallel processing - M- & P- pathways B. Second

More information

Clinical and Experimental Neuropsychology. Lecture 3: Disorders of Perception

Clinical and Experimental Neuropsychology. Lecture 3: Disorders of Perception Clinical and Experimental Neuropsychology Lecture 3: Disorders of Perception Sensation vs Perception Senses capture physical energy from environment that are converted into neural signals and elaborated/interpreted

More information

How do individuals with congenital blindness form a conscious representation of a world they have never seen? brain. deprived of sight?

How do individuals with congenital blindness form a conscious representation of a world they have never seen? brain. deprived of sight? How do individuals with congenital blindness form a conscious representation of a world they have never seen? What happens to visual-devoted brain structure in individuals who are born deprived of sight?

More information

Neuroimaging methods vs. lesion studies FOCUSING ON LANGUAGE

Neuroimaging methods vs. lesion studies FOCUSING ON LANGUAGE Neuroimaging methods vs. lesion studies FOCUSING ON LANGUAGE Pioneers in lesion studies Their postmortem examination provided the basis for the linkage of the left hemisphere with language C. Wernicke

More information

The origins of localization

The origins of localization Association Cortex, Asymmetries, and Cortical Localization of Affective and Cognitive Functions Michael E. Goldberg, M.D. The origins of localization The concept that different parts of the brain did different

More information

The Methods of Cognitive Neuroscience. Sensory Systems and Perception: Auditory, Mechanical, and Chemical Senses 93

The Methods of Cognitive Neuroscience. Sensory Systems and Perception: Auditory, Mechanical, and Chemical Senses 93 Contents in Brief CHAPTER 1 Cognitive Neuroscience: Definitions, Themes, and Approaches 1 CHAPTER 2 The Methods of Cognitive Neuroscience CHAPTER 3 Sensory Systems and Perception: Vision 55 CHAPTER 4 CHAPTER

More information

Title:Atypical language organization in temporal lobe epilepsy revealed by a passive semantic paradigm

Title:Atypical language organization in temporal lobe epilepsy revealed by a passive semantic paradigm Author's response to reviews Title:Atypical language organization in temporal lobe epilepsy revealed by a passive semantic paradigm Authors: Julia Miro (juliamirollado@gmail.com) Pablo Ripollès (pablo.ripolles.vidal@gmail.com)

More information

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES Varinthira Duangudom and David V Anderson School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA 30332

More information

Medical Neuroscience Tutorial

Medical Neuroscience Tutorial Pain Pathways Medical Neuroscience Tutorial Pain Pathways MAP TO NEUROSCIENCE CORE CONCEPTS 1 NCC1. The brain is the body's most complex organ. NCC3. Genetically determined circuits are the foundation

More information

Introduction to sensory pathways. Gatsby / SWC induction week 25 September 2017

Introduction to sensory pathways. Gatsby / SWC induction week 25 September 2017 Introduction to sensory pathways Gatsby / SWC induction week 25 September 2017 Studying sensory systems: inputs and needs Stimulus Modality Robots Sensors Biological Sensors Outputs Light Vision Photodiodes

More information

Over-representation of speech in older adults originates from early response in higher order auditory cortex

Over-representation of speech in older adults originates from early response in higher order auditory cortex Over-representation of speech in older adults originates from early response in higher order auditory cortex Christian Brodbeck, Alessandro Presacco, Samira Anderson & Jonathan Z. Simon Overview 2 Puzzle

More information

to vibrate the fluid. The ossicles amplify the pressure. The surface area of the oval window is

to vibrate the fluid. The ossicles amplify the pressure. The surface area of the oval window is Page 1 of 6 Question 1: How is the conduction of sound to the cochlea facilitated by the ossicles of the middle ear? Answer: Sound waves traveling through air move the tympanic membrane, which, in turn,

More information

Auditory System & Hearing

Auditory System & Hearing Auditory System & Hearing Chapters 9 part II Lecture 16 Jonathan Pillow Sensation & Perception (PSY 345 / NEU 325) Spring 2019 1 Phase locking: Firing locked to period of a sound wave example of a temporal

More information

Layered organization of cortex: Paleocortex 3 layers hippocampal formation / ventral & medial cortex closest to brainstem

Layered organization of cortex: Paleocortex 3 layers hippocampal formation / ventral & medial cortex closest to brainstem Layered organization of cortex: Paleocortex 3 layers hippocampal formation / ventral & medial cortex closest to brainstem Archicortex 3-4 layers hippocampal formation / amygdala Neocortex 6 layers more

More information

Exploring the pulvinar path to visual cortex

Exploring the pulvinar path to visual cortex C. Kennard & R.J. Leigh (Eds.) Progress in Brain Research, Vol. 171 ISSN 0079-6123 Copyright r 2008 Elsevier B.V. All rights reserved CHAPTER 5.14 Exploring the pulvinar path to visual cortex Rebecca A.

More information

Lecture Outline. The GIN test and some clinical applications. Introduction. Temporal processing. Gap detection. Temporal resolution and discrimination

Lecture Outline. The GIN test and some clinical applications. Introduction. Temporal processing. Gap detection. Temporal resolution and discrimination Lecture Outline The GIN test and some clinical applications Dr. Doris-Eva Bamiou National Hospital for Neurology Neurosurgery and Institute of Child Health (UCL)/Great Ormond Street Children s Hospital

More information

LISC-322 Neuroscience. Visual Field Representation. Visual Field Representation. Visual Field Representation. Visual Field Representation

LISC-322 Neuroscience. Visual Field Representation. Visual Field Representation. Visual Field Representation. Visual Field Representation LISC-3 Neuroscience THE VISUAL SYSTEM Central Visual Pathways Each eye sees a part of the visual space that defines its visual field. The s of both eyes overlap extensively to create a binocular. eye both

More information

Thalamus and Sensory Functions of Cerebral Cortex

Thalamus and Sensory Functions of Cerebral Cortex Thalamus and Sensory Functions of Cerebral Cortex I: To describe the functional divisions of thalamus. II: To state the functions of thalamus and the thalamic syndrome. III: To define the somatic sensory

More information

Psyc 311A, fall 2008 Conference week 3 TA: Jürgen Germann

Psyc 311A, fall 2008 Conference week 3 TA: Jürgen Germann Psyc 311A, fall 2008 Conference week 3 TA: Jürgen Germann e-mail: jurgen.germann@mcgill.ca Overview: 1. Meninges 2. Cerebral cortex-cytoarchitecture 3. Diencephalon (thalamus/hypothalamus) (this replaces

More information

The neurolinguistic toolbox Jonathan R. Brennan. Introduction to Neurolinguistics, LSA2017 1

The neurolinguistic toolbox Jonathan R. Brennan. Introduction to Neurolinguistics, LSA2017 1 The neurolinguistic toolbox Jonathan R. Brennan Introduction to Neurolinguistics, LSA2017 1 Psycholinguistics / Neurolinguistics Happy Hour!!! Tuesdays 7/11, 7/18, 7/25 5:30-6:30 PM @ the Boone Center

More information

Brain anatomy and artificial intelligence. L. Andrew Coward Australian National University, Canberra, ACT 0200, Australia

Brain anatomy and artificial intelligence. L. Andrew Coward Australian National University, Canberra, ACT 0200, Australia Brain anatomy and artificial intelligence L. Andrew Coward Australian National University, Canberra, ACT 0200, Australia The Fourth Conference on Artificial General Intelligence August 2011 Architectures

More information

Prof. Saeed Abuel Makarem & Dr.Sanaa Alshaarawy

Prof. Saeed Abuel Makarem & Dr.Sanaa Alshaarawy Prof. Saeed Abuel Makarem & Dr.Sanaa Alshaarawy 1 Objectives By the end of the lecture, you should be able to: Describe the anatomy and main functions of the thalamus. Name and identify different nuclei

More information

Methods to examine brain activity associated with emotional states and traits

Methods to examine brain activity associated with emotional states and traits Methods to examine brain activity associated with emotional states and traits Brain electrical activity methods description and explanation of method state effects trait effects Positron emission tomography

More information

STRUCTURAL ORGANIZATION OF THE NERVOUS SYSTEM

STRUCTURAL ORGANIZATION OF THE NERVOUS SYSTEM STRUCTURAL ORGANIZATION OF THE NERVOUS SYSTEM STRUCTURAL ORGANIZATION OF THE BRAIN The central nervous system (CNS), consisting of the brain and spinal cord, receives input from sensory neurons and directs

More information

Processing in The Cochlear Nucleus

Processing in The Cochlear Nucleus Processing in The Cochlear Nucleus Alan R. Palmer Medical Research Council Institute of Hearing Research University Park Nottingham NG7 RD, UK The Auditory Nervous System Cortex Cortex MGB Medial Geniculate

More information

Systems Neuroscience Oct. 16, Auditory system. http:

Systems Neuroscience Oct. 16, Auditory system. http: Systems Neuroscience Oct. 16, 2018 Auditory system http: www.ini.unizh.ch/~kiper/system_neurosci.html The physics of sound Measuring sound intensity We are sensitive to an enormous range of intensities,

More information

Do women with fragile X syndrome have problems in switching attention: Preliminary findings from ERP and fmri

Do women with fragile X syndrome have problems in switching attention: Preliminary findings from ERP and fmri Brain and Cognition 54 (2004) 235 239 www.elsevier.com/locate/b&c Do women with fragile X syndrome have problems in switching attention: Preliminary findings from ERP and fmri Kim Cornish, a,b, * Rachel

More information

Short Term and Working Memory

Short Term and Working Memory Short Term and Working Memory 793 Short Term and Working Memory B R Postle, University of Wisconsin Madison, Madison, WI, USA T Pasternak, University of Rochester, Rochester, NY, USA ã 29 Elsevier Ltd.

More information

THE COCHLEA AND AUDITORY PATHWAY

THE COCHLEA AND AUDITORY PATHWAY Dental Neuroanatomy Suzanne S. Stensaas, PhD February 23, 2012 Reading: Waxman, Chapter 16, Review pictures in a Histology book Computer Resources: http://www.cochlea.org/ - Promenade around the Cochlea

More information

Systems Neuroscience Dan Kiper. Today: Wolfger von der Behrens

Systems Neuroscience Dan Kiper. Today: Wolfger von der Behrens Systems Neuroscience Dan Kiper Today: Wolfger von der Behrens wolfger@ini.ethz.ch 18.9.2018 Neurons Pyramidal neuron by Santiago Ramón y Cajal (1852-1934, Nobel prize with Camillo Golgi in 1906) Neurons

More information

Comparing event-related and epoch analysis in blocked design fmri

Comparing event-related and epoch analysis in blocked design fmri Available online at www.sciencedirect.com R NeuroImage 18 (2003) 806 810 www.elsevier.com/locate/ynimg Technical Note Comparing event-related and epoch analysis in blocked design fmri Andrea Mechelli,

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Peter Hitchcock, PH.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Non-commercial Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/

More information