Measuring the Accuracy of the Neural Code

Similar documents
Introduction to Computational Neuroscience

Beyond bumps: Spiking networks that store sets of functions

Tuning Curves, Neuronal Variability, and Sensory Coding

Information Processing During Transient Responses in the Crayfish Visual System

Single cell tuning curves vs population response. Encoding: Summary. Overview of the visual cortex. Overview of the visual cortex

Supplementary materials for: Executive control processes underlying multi- item working memory

Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway

1 Introduction Synchronous ring of action potentials amongst multiple neurons is a phenomenon that has been observed in a wide range of neural systems

Sum of Neurally Distinct Stimulus- and Task-Related Components.

Analysis of in-vivo extracellular recordings. Ryan Morrill Bootcamp 9/10/2014

Model-free detection of synchrony in neuronal spike trains, with an application to primate somatosensory cortex

Ch.20 Dynamic Cue Combination in Distributional Population Code Networks. Ka Yeon Kim Biopsychology

Models of visual neuron function. Quantitative Biology Course Lecture Dan Butts

2 INSERM U280, Lyon, France

Overview of the visual cortex. Ventral pathway. Overview of the visual cortex

Sensory Cue Integration

Input-speci"c adaptation in complex cells through synaptic depression

Thalamocortical Feedback and Coupled Oscillators

Correcting for the Sampling Bias Problem in Spike Train Information Measures

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment. Berkes, Orban, Lengyel, Fiser.

Chapter 5. Summary and Conclusions! 131

An Ideal Observer Model of Visual Short-Term Memory Predicts Human Capacity Precision Tradeoffs

A Novel Account in Neural Terms. Gal Chechik Isaac Meilijson and Eytan Ruppin. Schools of Medicine and Mathematical Sciences

Lecturer: Rob van der Willigen 11/9/08

Neural Coding. Computing and the Brain. How Is Information Coded in Networks of Spiking Neurons?

Lecturer: Rob van der Willigen 11/9/08

Neuron, Volume 63 Spatial attention decorrelates intrinsic activity fluctuations in Macaque area V4.

An Automated Method for Neuronal Spike Source Identification

Theories of Visual Search and Their Applicability to Haptic Search

Biological Information Neuroscience

Author summary. Introduction

Sensory Adaptation within a Bayesian Framework for Perception

A developmental learning rule for coincidence. tuning in the barn owl auditory system. Wulfram Gerstner, Richard Kempter J.

Spectrograms (revisited)

Information and neural computations

Changing expectations about speed alters perceived motion direction

Models of Attention. Models of Attention

To Accompany: Thalamic Synchrony and the Adaptive Gating of Information Flow to Cortex Wang, Webber, & Stanley

Reading Assignments: Lecture 18: Visual Pre-Processing. Chapters TMB Brain Theory and Artificial Intelligence

Temporal Encoding in a Nervous System

Information-theoretic stimulus design for neurophysiology & psychophysics

Online optimisation of information transmission in stochastic spiking neural systems

Signal detection in networks of spiking neurons with dynamical synapses

Error Detection based on neural signals

EC352 Econometric Methods: Week 07

A Neural Model of Context Dependent Decision Making in the Prefrontal Cortex

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Rolls,E.T. (2016) Cerebral Cortex: Principles of Operation. Oxford University Press.

Electrophysiological and firing properties of neurons: categorizing soloists and choristers in primary visual cortex

Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

Feedback-Controlled Parallel Point Process Filter for Estimation of Goal-Directed Movements From Neural Signals

Computational Explorations in Perceptual Learning

Optimal speed estimation in natural image movies predicts human performance

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Absolute Identification is Surprisingly Faster with More Closely Spaced Stimuli

Functional Elements and Networks in fmri

ABSTRACT 1. INTRODUCTION 2. ARTIFACT REJECTION ON RAW DATA

Emanuel Todorov, Athanassios Siapas and David Somers. Dept. of Brain and Cognitive Sciences. E25-526, MIT, Cambridge, MA 02139

SUPPLEMENTARY INFORMATION

SI Appendix: Full Description of Data Analysis Methods. Results from data analyses were expressed as mean ± standard error of the mean.

Local Image Structures and Optic Flow Estimation

Optimal information decoding from neuronal populations with specific stimulus selectivity

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence

Learning in neural networks

In M.I. Jordan, M.J. Kearns & S.A. Solla (Eds.). (1998). Advances in Neural Information Processing Systems 10, MIT Press.

Bayesian and Frequentist Approaches

Supplemental Information: Adaptation can explain evidence for encoding of probabilistic. information in macaque inferior temporal cortex

Properties of V1 neurons tuned to conjunctions of visual features: application of the V1 saliency hypothesis to visual search behavior

Deconstructing the Receptive Field: Information Coding in Macaque area MST

Lecture 1: Neurons. Lecture 2: Coding with spikes. To gain a basic understanding of spike based neural codes

Different inhibitory effects by dopaminergic modulation and global suppression of activity

Consistency of Encoding in Monkey Visual Cortex

A Brief Introduction to Bayesian Statistics

Unit 1 Exploring and Understanding Data

Neural response time integration subserves. perceptual decisions - K-F Wong and X-J Wang s. reduced model

Adaptive leaky integrator models of cerebellar Purkinje cells can learn the clustering of temporal patterns

Neuromorphic computing

Information in the Neuronal Representation of Individual Stimuli in the Primate Temporal Visual Cortex

PCA Enhanced Kalman Filter for ECG Denoising

Theta sequences are essential for internally generated hippocampal firing fields.

Is the homunculus aware of sensory adaptation?

2012 Course : The Statistician Brain: the Bayesian Revolution in Cognitive Science

Heterogeneity and statistical signi"cance in meta-analysis: an empirical study of 125 meta-analyses -

Natural Scene Statistics and Perception. W.S. Geisler

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.

functions grow at a higher rate than in normal{hearing subjects. In this chapter, the correlation

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

Behavioral generalization

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

EFFECTS OF NOISY DISTRACTORS AND STIMULUS REDUNDANCY ON VISUAL SEARCH. Laurence D. Smith University of Maine

Active Control of Spike-Timing Dependent Synaptic Plasticity in an Electrosensory System

MEA DISCUSSION PAPERS

Why is our capacity of working memory so large?

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

REVIEWS. Extracting information from neuronal populations: information theory and decoding approaches

Neural Encoding. Naureen Ghani. February 10, 2018

2012 Course: The Statistician Brain: the Bayesian Revolution in Cognitive Sciences

Bayesian integration in sensorimotor learning

Transcription:

Measuring the Accuracy of the Neural Code Edward A. L. Challis Master of Science School of Informatics University of Edinburgh August 2007

Abstract Tuning curves are commonly used by the neuroscience community to characterise the response properties of sensory neurons to external stimuli. However, the interpretation of tuning curves remains an issue of debate. Do neurons most accurately encode stimuli located at the peak of their tuning curve, where they elicit maximal ring rates, and thus the response is most distinctive against background noise? Or do neurons most accurately encode stimuli in the high slope regions of their tuning curves, where small changes in the stimulus aect the greatest change in response? Previous measures of encoding accuracy have either explicitly or implicitly assumed one of these two intuitions. Butts and Goldman (2006) [10] recently applied a new measure of encoding accuracy, the SSI, to the tuning curves of single neurons and a population of four neurons. The SSI predicts how the location of high encoding accuracy will shift from slope to peak regions of the tuning curve, dependent upon the level of neuronal variability and task specicity. Butts and Goldman (2006) stated that the...ssi is computationally constrained to small populations of neurons ([10] p.0644) and did not apply their measure for populations with more than four neurons. By utilising Monte Carlo integration techniques, this project presents a novel method to apply the SSI to larger populations of neurons (up to 200), new noise regimes and smaller temporal windows. The results obtained from the application of the SSI to populations of identical Gaussian tuning curves, uniformly arrayed across the stimulus space and with uncorrelated noise, are compared to the predictions of the Fisher information (FI). The results show that the integration time window and the population size also dene where stimuli are most accurately encoded. As predicted in the innite limit of population size, the SSI and FI measures were seen to converge, however convergence was extremely rapid, in populations of 50 neurons the FI and SSI measures were qualitatively identical. Furthermore it is shown for the population sizes for which these measures dier (approximately 4 to 30 neurons dependent on the neural context), that the FI is not a reliable measure of encoding accuracy. i

Acknowledgements I would like to thank my supervisor Peggy Series for her constant help and reassurance. I would also like to thank Dan Butts and Mark Goldman for sharing their ideas and thoughts on this project. ii

Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualication except as specied. (Edward Challis) iii

Contents 1 Introduction 1 1.1 Motivation..................................... 1 1.2 Aims........................................ 3 1.3 Summary of Achievements............................ 4 1.4 Overview of Thesis................................ 5 2 Background 6 2.1 Initial Assumptions................................ 6 2.1.1 Rate Codes................................ 7 2.1.2 Tuning Curves.............................. 8 2.1.3 Neuronal Variability........................... 9 2.1.4 Population Codes............................. 11 2.2 Measuring the Neural Code........................... 13 2.2.1 Information Theory........................... 13 2.2.2 The SSI.................................. 14 2.2.3 Decoding................................. 16 2.2.3.1 MAP and ML Inference.................... 17 2.2.4 Fisher Information............................ 17 2.3 Mutual Information Fisher Information Link.................. 19 3 Models and Methods 20 3.1 The Encoding Model............................... 20 3.1.1 Tuning Curve Functions......................... 21 3.1.2 The Noise Model............................. 21 3.2 Measures..................................... 22 3.2.1 Decoding: Maximum Likelihood.................... 22 3.2.2 Evaluating the FI............................. 22 3.2.3 Evaluating the SSI............................ 23 3.2.3.1 The marginal SSI....................... 23 iv

3.2.3.2 The discrimination SSI.................... 24 3.3 Numerical Methods................................ 24 3.3.1 Quadrature................................ 24 3.3.2 Monte Carlo Approximation....................... 26 4 Results I: the SSI 28 4.1 Slope-to-Peak Transition for a Single Neuron................. 28 4.1.1 Experimentally Characterised Neurons................. 32 4.2 SSI for a four Neuron Population........................ 34 4.3 SSI Applied to Discrimination Tasks...................... 36 5 Results II: Extensions to the SSI 39 5.1 Integration Time Window............................ 39 5.1.1 Gaussian Tuning Curve with Linear Variability............ 39 5.1.2 Experimentally Characterised Neuron with Linear Variability.... 41 5.1.3 Gaussian Tuning Curve with Poisson Variability........... 41 5.2 The SSI in Larger Neural Populations..................... 43 5.2.1 The Slope-to-Peak Transition as a Function of the Neuronal Variability Level.................................. 44 5.2.2 Peak-to-Slope Transition as a function of the Population Size.... 49 6 Results III: Threshold Behaviour of ML Estimator 52 7 Discussion and Future Work 57 7.1 Discussion..................................... 57 7.1.1 How are stimuli encoded by a neuron's tuning curve?......... 57 7.1.2 What does the SSI oer as a means to measure the neural code?.. 59 7.2 Future Work................................... 60 7.2.1 Task Dependency............................. 60 7.2.2 Noise in the Stimulus.......................... 60 7.2.3 Further FI and SSI Comparisons.................... 60 7.2.4 Psychophysical Implications of a Slope to Peak Transition...... 61 v

Chapter 1 Introduction 1.1 Motivation One of the primary functions of our nervous system is to provide an interface between the stimulus provided by our external environment and higher cognitive areas of the brain. Such information is commonly encoded by the collective activity of large populations of neurons, with each neuron responding to a reduced subset of the stimulus space. In order to understand how information is encoded by a neuron and how this information is computed with, it is useful to characterise a neuron's stimulus-response relationship. A simple tool that is used to characterise this relationship is the tuning curve (see gure 1.1). The tuning curve of a neuron describes a parametric relationship between the stimulus value and mean spiking response rate of that neuron. However, the interpretation of tuning curves is still an issue of debate. Which stimuli do neuron's tuning curves optimally encode? Intuition oers two competing answers: 1. Neurons most accurately encode stimuli at the peak of their tuning curve. It is at the peak the response of the neuron is loudest, and thus the easiest to distinguish from background noise. 2. Neurons most accurately encode stimuli at the high slope regions of their tuning curve. It is at this point that small changes in the stimulus value aect the largest change in the neuron's response. 1

Figure 1.1: Schema of an orientation selective neuron with a bell shaped tuning curve. Are tuning curves most informative in the region for which they have highest ring rate, since it is here that the response is most pronounced against background noise? Or do tuning curves optimally encode stimuli in their high slope regions, where small changes in the stimulus value aect the greatest change in the response. Neuroscientists need measures that accurately represent how much stimulus information a neuron can encode and where stimuli are most accurately encoded. The two most common measures used to assess the accuracy of a neurons stimulus response relationship are the Fisher information (FI) and Shannon's mutual information (MI). In its simplest form, the Fisher information denes the high slope regions of the neurons tuning curve as the most informative. However, there exist constraints on the domain for which the FI is a relevant measure of accuracy. By the Cramer-Rao inequality, the Fisher information is related to the minimal possible variance of a read-out mechanism whose task would be to reconstruct the stimulus based on the neural response. However, this variance is only realisable by certain optimal estimators and is dependent up on properties of the neural population such as: the dynamic range of the neuron's response, the noise level and population size [33, 4]. The precise bounds on the contexts for which this threshold is realised are not known, and thus the neural contexts for which the Fisher Information is 2

informative as an encoding accuracy measure, are not fully understood. The MI is an entropy based measure of the statistical dependence between the stimulus and the response. The MI however, in it's application to tuning curves. The MI is a global measure of the information transfer capacity of a neuron, it is not stimulus specic. Which stimuli are the most accurately encoded is left unanswered by its application. Brunal and Nadal [8] in their 1998 paper proved an important theoretical link between the MI and FI measures in the innite limit of population size. While it is still not known how these two measures relate in the case of nite populations of neurons, this result is frequently provided as justication for using the FI as the sole means by which to measure the accuracy of the neural code. Recently, Dan Butts [9] proposed a new measure for the accuracy of the neural code, the Stimulus Specic Information (SSI). The SSI, a stimulus specic decomposition of the MI, raises some interesting questions and oers new insights as to how information is encoded by a neuron's tuning curve. In [10] the authors showed that the regions of high encoding accuracy will shift from slope to peak encoding depending on the level of noise in the neuron's response, and on task specicity. However, the neural contexts to which the SSI was applied in the study and which went on to provide the foundations for these conclusions were limited. The founding motivations of this project are: i) The growing prevalence of the Fisher measure as the sole means for assessing the accuracy of the neural code in the neuroscience community despite its limitations. ii) iii) iv) The dierent implications of the SSI and the FI measure in small neuronal populations. The previously limited application of the SSI measure. The claimed similarity between the SSI and FI measures in larger populations of neurons. This project implements the SSI and the FI measures and applies them to a simple `tuning curve plus noise' stimuli response model. The SSI is applied to larger neural populations and new noise regimes, and the predictions of the SSI are tested against those of the Fisher Information. 1.2 Aims In their 2006 paper Butts and Goldman stated that the SSI...is computationally constrained to the analysis of small populations ([10] p.0644). The largest population to 3

which the SSI has previously ever been applied consisted of only four neurons. This computational issue highly constricts the practicability of the SSI, since the number of neurons used in population codes is commonly thought to be much greater than this. The primary aim of this project is to asses the worth of the SSI by extending the contexts to which it has been applied. Firstly, this project will reproduce the results of Butts and Goldman. Secondly, the results of their study will be extended. The SSI will be applied to new noise regimes, varying integration time windows and, critically, larger neural populations. Thirdly, this project will compare the SSI and FI measures, and attempt to discern the neural contexts for which the FI is an informative measure of accuracy. Finally, this project will provide a qualitative analysis of the results presented, with the aim of answering two questions: 1. A neural coding question: is the slope-to-peak transition relevant for larger populations of neuron? Moreover, does the peak encoding regime even exist in larger populations of neurons? 2. A methodological question: are the SSI and FI measures similar for larger populations of neurons? Furthermore, is the Fisher Information relevant in the range of contexts in which the two measures dier? 1.3 Summary of Achievements The achievements of this project can be summarised by the results obtained and methods used to obtain them. 1. The results of Butts and Goldman [10] were fully reproduced (their paper is included in full at the end of this thesis). 2. Monte Carlo Integration techniques were successfully implemented and allowed the SSI to be applied to larger populations of neurons than ever before (up to 200 neurons). Details of this implementation are provided. 3. Estimates of the slope-to-peak transition values for the SSI are provided as a function of the population size, level of noise, and the integration time window. 4. Examples of the marginal SSI and FI measure's qualitative convergence as a function of the population size are provided. 5. Estimates are provided of the population sizes for which the maximum likelihood decoder saturates the Cramer-Rao bound. These estimates are obtained under a range of noise levels and integration time windows. Thus, bounds on the population size for which the FI is informative are obtained. 4

1.4 Overview of Thesis The remainder of this Thesis is organised as follows: in Chapter 2, a review of the relevant literature is presented, including a brief introduction to some of the key concepts throughout this project. Chapter 3 describes the neural encoding model used, and provides an account of the numerical methods that were implemented to evaluate the measures. In Chapter 4, we present reproduced results of Butts and Goldman's paper as a foundation for further analysis. In Chapter 5, we apply the SSI to new neural contexts and compare the results of the marginal SSI and the FI. In Chapter 6, we present results pertaining to the threshold behaviour of the Maximum Likelihood decoder, and ascertain the domain of applicability for the FI. Chapter 7 presents an analysis and discussion of the results obtained, ideas for future work and conclusions. 5

Chapter 2 Background The `neural code' refers to the neural representation of information, and its study can be divided into three inter-connected questions. First, what is being encoded? Second, how is it being encoded? Third, with what precision? ([6] p.947). This dissertation addresses the third question. However, implicit in the approach taken, are assumptions of how the neural code operates. Firstly this chapter reviews these assumptions (section 2.1). Secondly, an account of the core measures that are used in this project is provided: the Fisher information, the mutual information and the SSI (section 2.2). 2.1 Initial Assumptions Action potentials, or spikes, are brief (roughly 1 ms in duration) comparatively large uctuations (roughly 100 mv in amplitude) in the potential dierence across a neuron's membrane. Simply referred to as spikes, action potentials have the ability, unlike other observed electrical activity in nervous systems, to propagate signals over large distances along axons and across synapses. It is this property of spikes that forms the principle dogma of neural coding, that neurons represent and transmit information by ring sequences of spikes in various temporal patterns ([22] p.3). The core neural coding experimental paradigm probes the nervous system by investigating the relationship between stimuli, neuronal activity and behaviour (although it is rare that data is obtained from all three simultaneously [3]). Neurons are highly stochastic in their response to stimuli, so stimuli-response relationships are not causal but statistical. The following sections detail how this stochastic stimuli-response relationship is recorded and modelled in this project. 6

2.1.1 Rate Codes Spikes convey information through their temporal patterns. Although action potentials dier slightly in their amplitude and duration, they are mostly considered as stereotypical discrete events. The precise timing of spikes however is not a discrete quantity. To limit the possibly innite precision with which information could be embodied in the temporal patterns of spikes, both the experimentalist and theorist must place constraints on the accuracy with which spike timing is recorded. We know that there is a lower limit on the possible precision of the neural code. This is given by the absolute refractory period, which is the minimum time proceeding a spike for which it is impossible to evoke another (in the order of a few milliseconds). Beyond this refractory period however, there are no rm ideas to what constitute the relative time scale for which to measure neural responses. (Since populations of neurons jointly encode information, the refractory period does not even provide an absolute lower bound on temporal precision [25].) Traditionally two approaches have been applied to the problem of recording and representing neural spike data: the temporal coding approach, in which time is discretised and spikes binned (note that the width of time discretisation used here is critical and non linearly aects the amount of information that can be encoded); and the rate coding approach, in which the spiking responses of neurons is represented as the rate at which spikes are red over some integration time window. Temporal coding and rate coding approaches take dierent interpretations as to how information is encoded in spike trains. Temporal codes embody the intuition that the interspike intervals carry information (sometimes referred to as correlation codes), whereas rate codes regard this interval as superuous. Studies analysing the additional information content that inter-spike correlations can convey, have found that it is rarely larger than 10% than if spikes are considered as independently distributed (i.e. a rate code)[22]. Due to the intrinsic stochastic nature of neural ring patterns, and the increased precision required of temporal codes, very large amounts of data are needed to make reliable claims and analyses of temporal codes. The experimental diculty of holding neural cells for long enough to amass this data means that the research into temporal coding in the nervous systems is much weaker than that for the rate coding approach. The evidence suggests that both schemes operate to some extent, and that the schemes are not necessarily mutually exclusive [6, 22, 21, 26, 24]. The fact that the majority of data that is currently collected is rate based, and that there is strong evidence for the rate coding hypothesis, provides enough motivation for following the rate coding approach that is taken in this project. However, as the size of the time integration window used to calculate the ring rates is one of the aspects that 7

Figure 2.1: (A) Recordings from a neuron in the primary visual cortex of a monkey. A bar of light was moved across the receptive eld of the cell at dierent angles. The diagrams to the left of each trace show the receptive eld as a dashed square, and the light source as a black bar. The bidirectional motion of the light bar is indicated by the arrows. The angle of the bar indicates the orientation of the light bar for the corresponding trace. (B) Average ring rate of a cat V1 neuron plotted as a function of the orientation angle of a light bar stimulus (image and text taken from p. 15 of [22]) will be analysed in this project, some of the more temporal aspects of neural coding will be considered. 2.1.2 Tuning Curves One of the primary tools for analysing the stimuli-response properties of neurons is the tuning curve. Tuning Curves provide a parametric relationship between stimulus value and the mean neural response rate. Tuning curves are thus most suited to cases where the stimuli is a continuous variable (tuning curves are also applied to neurons found in motor regions, in these case the stimuli refers to parameters of the motor action). Tuning curves are thus characterised by the range of stimuli to which they respond, and the shape this mean rate traces across this stimulus set (see gure 2.1). A neuron's response distribution to the stimuli lying within its receptive eld typically depends up on a number of high-dimensional properties of that stimulus, and other neurophysiological factors including feedback activity from higher cortical areas [11]. A simple one dimensional function relating stimulus and mean response rate does not capture all these higher order statistics. Tuning curves are used, however, because of their intuitive appeal, and serve a base point for understanding more complex mechanisms such as adaptation, learning, and attention. A wide variety of tuning curve shapes can be found in the nervous system. Neuron's found in the auditory processing regions of the brain, exhibit band pass characteristics, 8

i.e. selective to a certain range in frequencies. Sigmoid tuning curves are often found in motor control domains. However, to obtain results that are easy to interpret, this project will only examine the stereotypical uni-modal orientation selective tuning curves (as can be characterised by half wave rectied cosine functions, or a Gaussian). Recording and Fitting The theoretically `ideal' tuning curve plots the mean ring rate averaged from spike counts recorded from a cell over an innite number of trials. However, there exist experimental constraints that limit the number of trials a physiologist can conduct to characterise the neuron's tuning curve (and nding the best tting procedure is, with such limited data, not straightforward [16]). Although methods dier, once stimuli-response data is collected, obtaining the tuning curve is then a problem of statistical regression. The basic tuning curve tting model is similar to any form of regression, y = f(x)+noise. In the context of a neuron's response to a stimuli, parametrised by s, this model can be expressed as response(s) = τf(s) + ɛ where ɛ is the residual noise term to be minimised by the appropriate tting of the rate tuning curve function f(s). The integration time window, τ, refers to the duration of time for which spikes are counted. Since there are energy constraints on a neuron's ability for sustained spiking, how this time window, τ, eects both the shape of the tuning curve and the noise distribution is not necessarily linear. Figure 2.2 (B) plots the exponent, B, and the multiplier, A, of the mean spike count variance relationship, σ 2 = A n B. In this gure, both A and B are non-constant as the integration window is varied. Tuning Curves Interpretation "Despite their ubiquitous application and straightforward formulation, the interpretation of tuning curves remains an issue of debate" ([10] p.0639). Do tuning curves best encode those stimuli for which the neuron's ring rate is maximal and thus most pronounced against background noise, or do neurons `best' encode those stimuli at which the tuning curve achieves its maximal gradient, where small changes in the stimuli aect the greatest change in response? (See gure 1.1 for a graphical account of these competing intuitions). Furthermore tuning curves do not represent the variability in response for a given stimulus. The deviation of a neuron's response, from the mean value as specied by the tuning curve, is termed the noise or neuronal variability. 2.1.3 Neuronal Variability In the context of tuning curves the neuronal variability can be wholly characterised by the response distribution p(r s), where r is the spike count of the neuron, and s the stimulus value. 9

Generative Spike Train Models One simplied generative model for spike train data, is the homogeneous Poisson process. For a constant ring rate value, a Poisson process generates a sequence of spikes over a set period of time by sampling a probability distribution. The probability that this process generates any particular sequence of spikes depends only upon the number of spikes in that time interval, and not the inter-spike intervals [22, 28]. For a set ring rate, r, and time interval τ, a homogeneous Poisson process denes the probability of ring n spikes as: p τ (Spike Count = n) = rτ n exp( rτ) (2.1) n! In Poisson spike trains the mean, n, and variance, σ 2, of the distribution are equal. However in many cases the response variability recorded in neurons does not follow this mean=variance relationship. Such an example is found in the MT region of the macaque monkey. The variability in this case can be better summarised by a more general meanvariance relationship: σ 2 = A n B (see gure 2.2 A). When B = 1 the multiplier A is commonly referred to as the Fano factor. Fano factors commonly range between F = 0.2 and F = 1.5 for the levels of variability in cortex [22, 17]. Additive and Multiplicative Noise Models However, levels of variability found in lower sensory processing regions of the brain commonly exhibit a linear relationship between mean spike count, n, and and standard deviation, σ. [20, 12]. In such cases the neuronal variability can be better characterised as σ = α + β n. In this case if β = 0 the noise is said to be additive, and when α = 0 multiplicative. 10

Figure 2.2: Variability of MT neurons in alert macaque monkeys responding to moving visual images. (A) Variance of the spike counts for a 256 ms counting period plotted against the mean spike count. The straight line is the prediction of the Poisson model. Data from 94 cells recorded under a variety of stimulus conditions. (B) the multiplier A in the relationship between spike-count variance and the mean as a function of the of the counting interval. (C) The exponent B in this relation as a function of the counting interval. (gure and text from p.32 [22] adapted from O'Keefe et al., 1997.) 2.1.4 Population Codes In many regions of the brain, information is encoded by the joint activity of populations of neurons. Examples of such `population codes' exist in many sensory and motor control domains. Two such examples are: the encoding of wind current direction in the cricket cercal system [20], and the encoding of orientations and directions in the primate visual cortex [19]. In population codes, sensory information is encoded by the collective activity of a group of neurons. The population is characterised by the range of stimuli to which it responds, and the tuning curves of the neurons that constitute that population. However, individual neurons in the population typically respond to a subset of the populations stimulus range. Figure 2.3 displays the four tuning curves that characterise the cricket cercal system, a portion of the crickets nervous system that encodes the direction of surrounding air currents. In this case, four separate neurons, each with receptive elds that respond to a 200 o range of wind directions, tile the entire 360 o stimulus space, to jointly encode all wind directions. 11

Figure 2.3: The four interneuron cricket cercal systems neuron's tuning curves, characterised by Miller at al. (1991) [20]. Each neuron responds with a ring rate that is closely approximated by a half wave rectied cosine function. The preferred orientation of each neuron is located roughly 90 o from each other, and the ring rate at these values is roughly 40 Hz. The error bars show ±1 standard deviation of the noise. (gure taken from [22] p.98.) The most immediate advantage that can be seen, is that the population is more robust to the eects of noise and individual neuron failure. In the cricket cercal system (see gure 2.3), a stimulus oriented at 225 o elicits a response from three of the four neurons in the population. Decoding a stimulus from the joint activity of all four neurons in this case is easier, as the noise in the response of individual neurons can be averaged out by the joint population response. This in turn reduces the time needed to process and accurately estimate the stimulus information. To achieve an accurate estimate of the ring rate from a single neuron would require longer integration time windows to average out the eects of temporal variation. Populations of neurons, with their receptive eld overlap, thus reduce the time taken to decode stimuli and improve accuracy. Noise Correlation Since the advent of multi-electrode recordings of neurons in-vivo, it has been observed that there is a large degree of correlation in the neural variability of neurons in a population [2]. The structure this correlation assumes, its magnitude and the eect it has on the nervous system's ability to compute using population activity, is however, not fully understood. Noise correlation can be better explained by considering two neurons with a positive neuronal variability correlation. In this case, for one particular stimulus, if one neuron 12

is seen to re above its mean rate then it is more likely that the other neuron will re above its mean rate too (similarly if neuron 1 res less than its mean the neuron 2 is more likely to do the same). Negative noise correlations are analogous, where if neuron 1 res above it's mean then neuron 2 is likely to re below its mean. Thus if all neurons were positively correlated, a population's ability to average out the destructive eects of noise by the joint activity of the population would be severely limited. Since the correlation structure of populations is not known, the extent to which this damages encoding accuracy cannot be completely determined, and it has been proved that in certain cases there may be no resultant loss of accuracy [1]. 2.2 Measuring the Neural Code Populations of neurons, and the tuning curves of neurons that constitute them, are not static. The shape and distribution of the individual neurons' tuning curves can change with the eects of adaptation, learning and attention. In order to interpret these changes, tools are needed for assessing the population's ability to encode information. Traditionally two approaches have been taken: information theory, that measures the statistical dependence between stimulus and response; and statistical estimation theory, that attempts to quantify the accuracy with which the stimulus can be reconstructed from the response. 2.2.1 Information Theory Information theory, originally conceived and presented by Claude Shanon's 1948 paper `A Mathematical Theory of Communication' [30], oers a method by which to quantify information transmission over noisy channels. In the context of neural coding, the transmission is via the nervous system, from stimulus to a neuron's spiking response. Entropy, the fundamental measure of information content in a signal, is the expectation of the negative log probability, i.e. Entropy = H[X] = E[ log 2 (p(x))] p(x) = p(x) log 1 P (x) x (note the logarithm is to the base 2, thus the units of entropy is the 'bit'). In this form entropy can then be considered a measure of uncertainty (a variable that can assume one of two states, where either is equally likely will have 1 bit of entropy), or similarly the capacity of a code to assume a variety of states. One important property of entropy is that it is additive, i.e. the entropy of two variables if they are independent is equal to the sum of their respective marginal entropies. To measure how much information can be transmitted over a noisy channel, a related measure, the mutual information (MI), can be used. If the set of all possible responses of a neuron can be dened as R, and the set of all possible stimuli S, the mutual information is the dierence in the entropy of the response, H[R], and the entropy of the noise, H[R S]. 13

I M [R; S] H[R] H[R S] (2.2) In this form the the MI can be interpreted as the reduction in entropy of the response, H[R], due to the entropy of the noise, H[R S], where H[R S] = p(s) p(r s) log 2 p(r s). s r The MI in the form of equation 2.2 is equivalent to the Kullback-Leibler divergence between the joint stimulus response distribution, p(s, r) and the product of the marginals, p(s)p(r). I M [R; S] r s p(r, s) log 2 p(r, s) p(r)p(s) (2.3) The Kullback-Leibler divergence, reects the dierence between two probability distributions. Thus the Mutual Information measures the distance between the probability distribution of the joint stimulus-response distribution and the probability distribution of these two quantities if they were independent [6, 24, 8]. Note that the MI does not say either how information is encoded or how much of this information is utilised by the nervous system. The MI does however place an upper limit on this information. What information theory primarily oers to the study of the neural code is an `assumption free' method for assessing which attributes of a stimulus have the strongest relationship (i.e. MI) with a neuron's response [6]. `Assumption free' as the mutual information can be calculated between stimulus attributes and the statistics of raw spike data [24]. However applying the MI to the parametrised population tuning curves is not so informative. The MI returns a single number for the entire population's encoding scheme, but the question of how and where stimuli are best encoded is left unanswered. 2.2.2 The SSI The strength of information theory is that with relatively few assumptions, estimates can be made of the interdependence of stimuli and response, and thus relevant features of the coding scheme can be determined. This strength is weakened, however, by the fact that the MI, as proposed by Shannon, oers no means by which to see how individual responses or stimuli contribute to the total MI. To this end, a lot of work has been done to decompose the MI as a function of either the response or stimulus [17, 18, 20, 14, 5, 15]. The basic form of such decompositions is I M [R; S] = s p(s)i(s) = r p(r)i(r) (2.4) where i(s) and i(r) are measures of the specic information associated with either the stimulus or response respectively. Such decompositions are not unique, there are in fact arbitrarily many ways to decompose the MI [9]. 14

One such decomposition of the MI was that proposed by DeWeese and Meister (1999), the specic information [15]. The specic information is a measure in the reduction in uncertainty of the stimulus gained from an observation of a particular response (cf. MI which is the average reduction in stimuli having observed a response), parameterised in the following form i sp (r) = H[S] H[S r] (2.5) The specic information is thus a measure of the informativeness of a response. However, this formulation does not reect the causal relationship between stimulus and response. Butts (2003) [9], reformulated the specic information to account for this causality. The Stimulus Specic Information (SSI) is thus dened as i SSI (θ) = r p(r θ)i sp (r) (2.6) Since the specic information is dened as the reduction in uncertainty of the stimulus given one observation of the response r R,...the SSI i SSI (s) is the average reduction of uncertainty gained from one measurement given the stimulus s S. ([9] p.180). This formulation of the SSI thus maintains the relationship of being the weighted average of the MI (i.e. I M ([S; R] = p(s)i SSI (s)). s Butts and Goldman (2001) applied the SSI to tuning curves of single neuron and a small population of four neurons, to investigate the predictions that it made under a variety of dierent tuning curve and noise models. This paper (included in full in the appendix) presented three core results from the application of the SSI: 1 That whether stimuli are best encoded in the high-slope regions or peaked regions of the tuning curve was dependent upon the level of noise in the response. In particular, in low noise regimes the SSI placed high encoding accuracy on the high slope regions of the tuning curve. In high noise regimes the SSI placed high accuracy in the peaked regions of the tuning curve. 2 The application of the SSI to a population of four neurons similarly exhibited this slope-to-peak transition as a function of the noise level. The noise level needed to facilitate this transition however was signicantly higher. 3 By modifying the SSI slightly and changing the prior on the stimulus, the authors applied the SSI to discrimination tasks. It was found that despite the level of noise, ne discrimination was always `better' encoded in the high slope regions of the tuning curve, whilst coarse discrimination tasks were `better' encoded in the peaked regions of the tuning curve. 15

These results showed that both peak-encoding and slope-encoding interpretations of tuning curve could be correct, but this was dependent upon certain factors: the noise level, the population size, and task specicity. 2.2.3 Decoding Another approach to assessing the accuracy of neural coding schemes is to consider the accuracy with which stimuli can be reconstructed, or decoded, from neurons' spiking responses. By employing optimal decoding techniques, the error of these stimulus estimates compared to the true causal stimulus value provide a means to assess the accuracy by which stimuli are encoded. Regardless of whether the implementation of these methods is biologically plausible or not. Furthermore, by comparing the limit of optimal decoding accuracy from the recorded activity of neurons and the behavioural accuracy of an animal, conclusions can be made about the properties of the neural populations for which stimuli are encoded (such as the correlation structure of the noise [35] or the population's size [7]). There are many possible ways in which to decode neural responses. A common method is the population vector. Population vector decoding takes the weighted average of the preferred orientation of each neuron in the population. The vector of weights in this case is the spike count of each neuron. The accuracy of this decoding scheme is dependant upon the uniformity of the distribution of preferred orientations and the number of neurons in the population. [22, 33]. This method, nonetheless, is neither a general nor an optimal decoder The quality of a decoding procedure can be assessed by considering two properties of the reconstructed stimuli statistics, the bias and the variance. The bias of an estimator is a measure of the systematic shift in the predicted stimuli against the true stimulus value, i.e b(s est ) = s est s true. The variance of an estimator is a measure in the spread of predicted values about its mean, i.e. var(s est ) = (s est s est ) 2 (decoded over innitely many trials for one particular antecedent stimuli). Estimators with zero bias are referred to as unbiased estimators. Estimators that also achieve minimum estimator variance are called optimal. Optimal estimation can realised by utilising tools from probability theory. As has previously been discussed, neural responses to stimuli are stochastic. Similarly, stimuli themselves can also be considered as stochastic quantities, since both the presence (or absence) and the value that the stimuli assumes in the world can be described by probability distributions. Bayes rule (equation 2.7), a basic identity in probability theory, provides a means to infer the value of one quantity having observed the other, if the two quantities' joint statistics are known. p(s r) = p(r s)p(s) p(r) 16 = p(r s)p(s) p(r s)p(s) r (2.7)

2.2.3.1 MAP and ML Inference One important example of an optimal estimator is the maximum a posteriori (MAP) method [13, 22]. MAP inference decodes the stimulus as that having the largest probability conditioned on the observed response, i.e. s MAP = argmax p(s r). Using Bayes rule this formula can be put in the form of two probability density functions (pdf) that are known. s MAP = argmax p(s r) = argmax s s s p(r s)p(s) p(r s)p(s) s (2.8) If the stimulus distribution, or the prior, is considered uniform, MAP inference is equivalent to Maximum Likelihood decoding as the prior has no aect on the argument which is to be maximised. s ML = argmax p(r s) (2.9) s Because of its optimality, ML decoding is frequently used as a reference with which to compare to the behavioural performance of animals, and thus make conclusions about the neural populations from which stimuli are encoded. B1 2.2.4 Fisher Information The Cramer-Rao inequality (equation 2.10) places a theoretical lower bound on the variance of any decoding scheme. This bound is specied in terms of two quantities: the bias of the estimator and the Fisher information. var(s est ) (1 + b (s est )) 2 I F (s true ) where b (s est ) is the derivative of the bias. Thus for unbiased estimators (2.10) var(s est ) 1 I F (s true ) (2.11) If the stimulus is a continuous variable, and if the distribution of the neuronal response can be described by a conditional pdf p(r s) then I F (s true ) is the Fisher information (FI). The Fisher information is dened as the expectation of the curvature of the log likelihood of response conditioned on the stimulus. I F (s) = 2 ln p(r s) s 2 = ( 2 ) ln p(r s) p(r s) s 2 dr (2.12) The Fisher information, is not an entropy based information quantity. However, as seen in equation 2.12 the larger the Fisher information the smaller the lower-bound on 17

Figure 2.4: The Fisher information for a single neuron with a Gaussian tuning curve with preferred orientation at s = 0, and unit variance. The Fisher information (solid curve) has been divided by the maximal ring rate of the tuning curve. The tuning curve, scaled by the maximal ring rate is plotted in the dashed curve. Note how Fisher information is maximal in the high-slope regions of the tuning curve, and zero at the value for which the tuning curve achieves its maximal rate. (gure and text from p.111 of [22]) the variance. It is in this sense that I F is a measure of accuracy, and thus termed an `information' quantity. Applying the FI to the case of a single neurons tuning curve illustrates some important properties of the FI (see gure 2.4). In this case, the FI is maximal at the high-slope portions of the tuning curve, this reects the fact that small dierences in orientation correspond to the largest dierences in response in these regions. On the contrary at the peak of the tuning curve the FI is zero, since here small changes in the stimulus aect no change in response. The shape of Fisher information can appear counter-intuitive though. An observer asked to decode the stimulus from observing the ring rate of the neuron would nd it especially hard in the regions of maximal FI as there are no clues as to which side of the neuron's tuning curve the stimulus is. Furthermore the regions for which the FI is zero, and thus the variance of an estimator is unbounded, elicit the most pronounced and intuitively distinctive ring rates. This reects an important property of the FI, it should be understood as a local measure. The FI is stimulus specic as the expectation is take over the pdf p(r s), and thus does not consider the decoding ambiguity of stimulus values far from s true that could evoke similar ring rates. The FI is a local measure of the dierence that small changes in stimulus value have on the response. Validity of the FI The Cramer-Rao bound is what denes the FI as an informative quantity. However, if the variance of optimal decoding schemes frequently do not saturate this bound, the FI is of limited use. The question then, if the FI is to be used by the neuroscience community is, under what neural population contexts can the Cramer-Rao bound be saturated, and thus is the FI informative? It turns out the the answer depends 18

upon the dynamic range of neural responses and the population size. Bethge et al. (2002) [4] showed that only if the dynamic range of the population was suciently large could the FI be considered as an accurate measure of encoding precision. If the dynamic range is dened as the range of spike counts a neuron could re in a dened time window. Xie (2002) [33] also looked into this issue by applying the ML decoding method to populations with increasing number of neurons under varying levels of additive noise. The ML method is asymptotically unbiased and ecient, by determining the population threshold needed to saturate the Cramer-Rao bound then the domain of applicability of the FI can be ascertained. However, Xie only applied this analysis in the case of additive noise. A complete specication of the conditions under which the FI is informative is missing. 2.3 Mutual Information Fisher Information Link As the Fisher information and mutual information are two quantities that will be used extensively in this project, it is important to clarify their interrelationship. Although these two measures encode dierent intuitions, one a local measure of the log-likelihood's `peakyness', the other a global measure of statistical interdependence, they are related in several ways. Two of which for the purposes of this project are noteworthy. Firstly, the mutual information between one pdf and the same pdf shifted slightly is proportional to the Fisher information [34]. This is of particular consequence, as we can thus interpret the Fisher information as a mutual information measure for ne discrimination tasks. Another, MI FI relationship was presented by Brunel and Nadal (1998) [8] that extends the work of Clarke and Barron (1990) and Rissanen (1996). The authors proved a number of results relating the two quantities. Firstly, that in the limit of large populations of neurons...the mutual information between the activities of the neuronal population and the stimulus becomes equal to the mutual information between the stimulus and an ecient Gaussian estimator ([8] p.1732). The `ecient Gaussian estimator' refers to an estimator that has a variance equal to the Cramer-Rao bound, and thus is related to the Fisher information. Secondly, the authors show that this equality holds for a single cell in the case of Gaussian noise with vanishing variance (equivalent to obtaining a neurons response rate from averaging over an innitely long time window). These results rest upon asymptotic properties of the Gaussian distribution, and the exact relationship between the two is unclear in the case of nite size (in the case of populations of neurons) and nite time (in the single neuron case) neural codes [3]. One of the fundamental questions that Brunal and Nadal's paper raises is, what circumstances are sucient to reach this asymptote? Although this question remains unanswered, this theoretical results has been frequently used as justication for using the FI as the sole means for assessing coding accuracy. 19

Chapter 3 Models and Methods This Chapter provides an overview of the basic neural encoding model used, and how the measures were implemented within this framework. Firstly, the basic stimulus to response model is described. Secondly, an account of the methods and measures that are implemented is provided. Finally, a technical account of the SSI's implementation is provided. 3.1 The Encoding Model Throughout this project, the ring rate response properties of neurons are represented by a generic tuning curve plus noise model. Within all the experimental contexts considered the neuron can be considered to be characterised by their response to some uni-dimensional stimulus feature, for example the orientation of a bar, θ [ 180 o, 180 o ]. The response properties of a neuron, measured as spikes for a particular stimulus value θ recorded over a period of time τ (s), can thus be t using the following model response(θ) = τ.f(θ) + η (3.1) where f(θ) denes the tuning curve shape the mean response rate for this neuron at a particular stimulus value θ (units Hz, or spikes per. second). The neuronal variability is characterised as η in equation 3.1, and is considered to be a Gaussian random variable with mean zero and standard deviation (std) σ. The length of time integration window, τ (seconds), is the number of seconds for which the spikes were counted. To generate sample responses from a neuron characterised by the model above with known parameters then simply translates to the problem of sampling from a Gaussian probability density function (pdf), where the mean response is the value specied by the tuning curve for that stimulus value, τ.f(θ), and the std., σ(θ), is dened by the noise model. 20

In the case of populations of neurons, tuning curves' preferred orientations are considered to be uniformly arrayed across the stimulus set, and all other parameters that dene the individual neuron's tuning curve values are identical across the population. 3.1.1 Tuning Curve Functions Three tuning curve functions were implemented: Gaussian, Circular Normal, and Cosine tuning curves. Gaussian 1, Circular Normal, f(θ) = f max. exp ( (θ θ 0) 2 ) + b (3.2) 2.f 2 var Cosine tuning, f(θ) = f max. exp [ f var. (1 cos(θ θ 0 ))] + b (3.3) f(θ) = 1 0.86. [cos(θ θ 0) 0.14] + + b (3.4) In all three equations above (equations. 3.2, 3.3 and 3.4) b refers to the baseline ring rate Hz, θ 0 is the preferred orientation of the tuning curve, f max the value the tuning curve achieves at that preferred orientation. For equation 3.4 parameters were taken as in [10]. 3.1.2 The Noise Model In all the experimental contexts considered, one generic noise model is used to characterise the neuronal variability. The noise model is a function that denes the std. of the Gaussian variability in terms of the neuron's tuning curve values at that particular stimulus. [ σ(θ) = A α + β. (τf(θ)) φ] (3.5) This general model allows a broad range of noise regimes to be realised, multiplicative (α = 0), additive (β = 0), and a close approximation to Poisson variability too (where A = 1, α = 0, β = 1, and φ = 0.5). Noise regimes that are characterised by their Fano factor, F, are implemented under this model as: A = 1, α = 0, β = F and φ = 0.5. All noise is considered to be uncorrelated, i.e. in a population of two neurons p(r 1, r 2 θ) = p(r 1 θ)p(r 2 θ). 1 The Matlab algorithms for these tuning curves are provided in the Appendix 21

3.2 Measures 3.2.1 Decoding: Maximum Likelihood The maximum likelihood decoder predicts the stimulus value, ˆθ, as that with the maximum log likelihood given the statistics of the population, and the sampled response from a single trial. We denote by r = [r 1, r 2, r 3,..., r N ] a single trial response of the population, where N is the number of neurons in the population. The maximum likelihood decoder, predicts the stimulus value, ˆθ, as that with the maximum log likelihood given the properties of the population, and the response r. ˆθ = argmax θ L(θ) = argmax θ N p(r i θ) = argmax i=1 θ N log p(r i θ) (3.6) By sampling responses from the population many times, and decoding using the ML method of equation 3.6, an estimate of the variance of the ML method can be determined by this equation i=1 var(ˆθ) = 1 ntrials ntrials i=1 ) 2 (ˆθi ˆθ (3.7) 2.10), The variance of this estimator will be compared to the Cramer Rao Bound (equation var(θ est ) (1 + b (θ est )) 2 I F (θ) (3.8) Where, b (θ est ) is the derivative of the bias of the estimator and I F (θ) is the Fisher information (FI). 3.2.2 Evaluating the FI The Fisher information is dened as I F (θ) = p(r θ) 2 log p(r θ) θ 2 dr = ( ) log p(r θ) 2 p(r θ) dr (3.9) θ The neuronal variability, η, for each neuron in the population is a random variable sampled from a uni-variate Gaussian with zero mean and std. σ, i.e. η N (0, σ 2 ). Since the noise is considered independent amongst all neurons in the population, the neuronal variability for the population as a whole can be considered to be sampled from a multivariate spherical Gaussian, i.e. p(r θ) N (f(θ), Σ(θ)) where f(θ) = [f 1 (θ), f 2 (θ),...f N (θ)] and 22

Σ(θ) = diag ( σ 2 1 (θ), σ2 2 (θ),...σ2 N (θ)). The Fisher information in this case is dened as, I F (θ) = N f i (θ)2 σ i (θ) 2 + 2.σ i (θ)2 σ i (θ) 2 (3.10) i=1 3.2.3 Evaluating the SSI To evaluate the SSI, recall equation 2.6. i SSI (θ) = = = V r V r V r p(r θ).i sp (r)dr (3.11) p(r θ). {H [θ] H [θ r]} (3.12) {[ ] [ p(r θ). p(θ ) log 2 p(θ ) p(θ r) log 2 p(θ r) dr(3.13) θ θ }{{ } A ]} In this equation V r is the volume of all possible responses of the neural population. p(θ) refers to the pdf dening the stimulus distribution, considered uniform (θ U[ 180 o, 180 o ]). p(r θ) is the probability of observing response r from the collective activity of all neurons in the population for a particular stimulus value θ, so r = [r 1, r 2, r 3,..., r N ]. As neuronal variability is considered independent, i.e. no noise correlation, the response pdf of the population factorises to the product of the the individual neuron's conditional response pdfs. So p(r θ) = N p(r i θ), p(θ r) is calculated using Bayes rule, where p(θ r) = P p(r θ)p(θ) p(r θ)p(θ). θ i=1 3.2.3.1 The marginal SSI The marginal SSI in a population of neurons is the information contribution that one neuron makes to the population as a whole. This quantity is then calculated by subtracting the SSI for the whole population but one neuron from the SSI of the population as a whole. Thus the marginal SSI calculated for neuron j is dened as i m SSI(θ) = p(r θ).i sp (r)dr p(r θ).i sp (r )dr (3.14) V r V r where, r = [r 1, r 2,..., r j 1, r j+1,...r N ]. 23

3.2.3.2 The discrimination SSI The SSI applied to discrimination tasks involves a slight modication of equation 3.13. To calculate the SSI for the coarse and ne discrimination tasks, the prior on the stimulus set is altered to include only two possible stimuli (which refers to the forced choice experimental paradigm). For the ne discrimination task the SSI at a stimulus value of θ is dened as the average of the SSI at θ + 3 o and θ 3 o with a prior on the stimulus set restricted to those two values i.e. θ {θ + 3 o, θ 3 o } where p(θ = θ + 3 o ) = p(θ = θ 3 o ) = 1 2. Thus to calculate the ne discrimination SSI for two stimuli at θ the formula is i d SSI(θ) = p(r θ) V r θ {θ +3 o, θ 3 o } p(θ ) log p(θ ) θ {θ +3 o, θ 3 o } p(θ r) log p(θ r) dr (3.15) The coarse discrimination task is analogous, but with the 3 o oset being changed such that the two stimuli are diametrically opposite in the stimulus space (i.e. θ {θ, θ+180 0 }). This new formulation of the SSI is no longer a stimulus decomposition of the mutual information, and thus i d SSI 3.3 Numerical Methods 3.3.1 Quadrature is normalised to unity. To reproduce the results of Butts and Goldman, rst I used the same method as the authors to evaluate the SSI: the method of quadrature (recall equation 3.13 to evaluate the SSI computing three integrals is necessary; H[θ], H[θ r] and the i SSI (θ)). To evaluate the SSI for the case of a single neuron, discretisations of 1 o and 0.01 (spikes) were used for the stimulus and response respectively. In the case of the four neuron population, coarser discretisations were used r = 0.05 (spikes) and θ = 5 o (as in [10]). One dimensional quadrature can approximate an integral as the sum of the function evaluated at a number of uniformly arrayed points in the domain of integration, x i = a + b a n.i for i = 1, 2,..., n as follows. b a f(x)dx b a n n f(x i ) (3.16) To evaluate the SSI for a single neuron the upper and lower limits of integration (for a single neuron the domain of integration has one dimension, in this case V r = [r min, r max ]) were taken to be such that r max = max [τ. (f(θ) + 4.σ(θ))] and r min = θ i=1 24

θ min [τ. (f(θ) 4.σ(θ))] (and thus accounting for over 99.99% of the variation in the response). For a discretisation coarseness of r (spikes) this results in evaluating and summing for n = rmax r min r times. The multidimensional case with hypercube domains is a simple extension of this idea. To calculate the SSI for a population consisting of four neurons using quadrature in this case would consist of integrating over a four dimensional hypercube. This in turn translates to evaluating and summing the SSI integrand (part A of equation. 3.13) n 4 times, where n is the number of points each dimension is divided into and evaluated at. To implement this procedure in Matlab, two methods can be used; using for loops, which is time intensive; or using arrays, which is space intensive. Since Matlab is optimised for matrices, the second method was initially considered. To evaluate the integrand over a D dimensional domain, such that V r = [r min, r max ] D where the the integrand is evaluated at n points along each dimension (where n = rmax r min r ) and for a xed maximum array size (a hard memory constraint) the size of n must decrease exponentially (see gure 3.1). Figure 3.1: As the dimensionality of the domain of integration increases the number of samples that can be taken along each dimension, n decreases exponentially. These numbers are calculated for an array with at most 3.2 10 8 elements. The alternative is using for loops to evaluate the sum, but this too is impracticable. For a four neuron population with the discretisation sizes mentioned above, evaluating the SSI took approximately 45 minutes 2. Further embedded for loops would mean that adding just two more neurons to the population would result in the run-time of the algorithm being roughly 38 days long. 2 using an Intel Pentium 4 2.8 GHz processor with 500 Mb of RAM 25

3.3.2 Monte Carlo Approximation To overcome the limitations of quadrature I investigated the possibility of using Monte Carlo integration methods. Monte Carlo approximation techniques can be used to approximate an expectation of the function of a random variable, not by integrating over the probability space, but by sampling the pdf and summing the value of the expected function at those values. From equation 3.13 it is clearly seen that the SSI is in the form of an expectation of a function of a random variable, i sp (r), with respect to the pdf p(r θ). i SSI (θ) = p(r θ).i sp (r)dr = E [i sp (r)] p(r θ) (3.17) V r Thus the SSI can be approximated by sampling the spherical multidimensional Gaussian p(r θ) n times, giving us r 1, r 2, r 3,..., r n, for each value of θ and then averaging the i sp (r j ) at these values. Using this approximation technique we get, {[ i SSI (θ) 1 n ] [ p(θ ) log n 2 p(θ ) ]} p(θ r j ) log 2 p(θ r j ) θ θ j=1 (3.18) Monte Carlo (MC) techniques and their accuracy, however, critically depend on the curvature of the function being integrated. If the function is well behaved, good approximations to the integral can be made with relatively few samples from the domain, resulting in a faster run-time. To empirically investigate how appropriate MC techniques were for this problem the SSI was evaluated using both the quadrature and MC methods and the results compared. By calculating the Mean Squared Error (MSE) of the MC approximation against the number of MC iterations (n in equation 3.18) used, the convergence of the MC procedure can be ascertained. This analysis was done in the case of a single neuron gure 3.2. These results showed that irrespective of the level of neuronal variability MC approximation converges to the true SSI value (evaluated using quadrature) exponentially, and with no bias. To see how the MC approximation method converged in the case of a population of four neurons the variance of the estimator was plotted as a function of the number of MC iterations for Poisson like variability. In this case convergence is also exponential (see gure 3.3). 26

10 1 Low Med. High Poiss. MSE of MC approximation of SSI 10 2 10 3 10 4 10 0 10 1 10 2 10 3 no. of MC iterations Figure 3.2: Log scaled for the Monte Carlo convergence in the case of a single neuron. MSE of the MC approximation vs. no of MC iterations used. The four noise cases plotted are: Low noise (A = 1, α = 0.024, β = 0.026 and φ = 1), Medium noise where A = 6, High noise A = 10, and Poisson variability where A = 1, α = 0, β = 1 and φ = 0.5. 10 1 variance of MC 10 2 10 3 10 4 10 1 10 2 10 3 Population Size Figure 3.3: Log scaled plot of the variance of the MC integration procedure as a function of the number of MC iterations used. The variance of the MC integration procedure decreases exponentially. The noise model is Poison-like: A = 1, α = 0, β = 1 and φ = 0.5. Tuning curve properties: f max = 75 Hz, b = 5 Hz and f var = 30 0. 27

Chapter 4 Results I: the SSI This Chapter presents the SSI results of Butts and Goldman (2001) [10]. This serves two purposes. Firstly it provides a foundation for further analyses. Secondly it allows for a validation of their results. The paper of Butts and Goldman (2001) and all supporting materials are included at the end of this thesis for reference. For the sake of clarity further references to this paper will not be made in this chapter. The results that follow are presented in the same order as they appear in [10]. The key results of Butts and Goldman that will be reproduced here are: the transition of best-encoded stimulus from high-slope to high-rate regions of the tuning curve for a single neuron as the level of neuronal variability is increased (section 1.1.1); the application of the SSI to experimentally characterised neurons (section 1.1.2); the SSI applied to a population consisting of four neurons (section 1.2); and the SSI's application to two choice discrimination tasks (section 1.3). 4.1 Slope-to-Peak Transition for a Single Neuron Butts and Goldman found that the region of best-encoded stimulus transitioned from highslope to high-rate regions of a neuron's tuning curve as the level of neuronal variability increased (eectively the spread of distribution p(r θ)). This transition is shown for a number of prototypical noise models that reect those recorded by physiologists: additive, where sigma = const., such variability can be found in retinal ganglion cells (for example reported by Croner et al. 1993 [12]); multiplicative noise where σ τf(θ) (as reported by Miller et al. 1991 [20]); a scaling of the two; and Poisson-like variability, where σ 2 τf(θ) that is frequently encountered in cortex (as reported by Kang et al. 2004 [17]). The slope to peak transition is found in all four cases. Recall the general noise encoding model of equation 3.5. We dene the standard deviation σ of the neuronal variability η N (τf(θ), σ(θ) 2 )) by 28

[ σ(θ) = A α + β. (τf(θ)) φ] (4.1) In equation 4.1, α is the additive noise term, β the multiplicative and A a noise scaling term. For a linear noise model in which only the level of additive noise is varied β is xed and φ = 1. Similarly, in a linear noise model where the multiplicative noise is varied, α is xed and β varied (again φ = 1). For a linear scaling of the noise the A term is varied and for Poisson like noise A = 1, α = 0, β = 1 and φ = 1 2. The general tuning curve model used in this section is Gaussian with parameters: f max = 1 Hz, f var = 30 o, θ 0 = 0 o, the length of the time integration window, τ, is set to 1 s. Varying α Increasing α facilitates the slope to peak transition by increasing the level of additive noise. The SSI is calculated (see gure 4.1) for three noise levels: low noise where α L = 0.024 (blue dashed), high noise α H = 0.144 (red), and the slope to peak transitional level α T r = 0.07 (black). Std. of η 0.2 0.15 0.1 0.05 0 180 90 0 90 180 stimulus value, θ Information, i SSI (θ), (bits) 5 4 3 2 1 0 180 90 0 90 180 stimulus value, θ Figure 4.1: The eect of varying the additive component of the noise model, α, on the SSI. (Left) The standard deviation of the noise model as a function of the stimulus. (Right) the SSI for the neuron plotted as a function of the stimulus. The slope-to-peak transition can be seen as the noise level increases. Peak-encoding can be seen for α L = 0.024 (blue), transitional value α T r = 0.07 (black), and peak-encoding α H = 0.144. Noise parameters: A = 1, β = 0.026, and φ = 1. (Reproduction of gure S1-1 from [10]). A further observation that can be made from this gure is that the region for which signicant amounts of information is conveyed shrinks (i.e. the width of the high-noise SSI is smaller than that for the low-noise SSI). This makes sense. Low ring-rate regions of the neuron's tuning curve are eectively having their signal drowned out by the uniform application of noise across the stimulus set. The reduction in the signal to noise ratio is 29

proportionally much greater for those low-rate regions of the tuning curve. Varying β Increasing the level of multiplicative noise by varying β can also facilitate the slope-topeak transition. The SSI calculated for three separate levels of multiplicative noise can be seen in gure 4.2. Here the low noise case is plotted in blue where β L = 0.026, the slope-to-peak transition occurs at β T r = 0.21, and an example of high noise peak encoding is plotted in red where β H = 0.42. Std. of η 0.4 0.2 0 180 90 0 90 180 stimulus value, θ Information, i SSI (θ), (bits) 5 4 3 2 1 0 180 90 0 90 180 stimulus value, θ Figure 4.2: The eect of varying the multiplicative component of the noise model, β, on the SSI. (Left) The standard deviation of the noise model as a function of the stimulus. (Right) the SSI for the neuron as a function of the stimulus. The slope-to-peak transition can be seen as the noise level increases. An example of slope-encoding can be seen for β L = 0.026 (blue), transitional value β T r = 0.21 (black), and peak-encoding β H = 0.42. Noise parameters: A = 1, α = 0.024, and φ = 1. (Reproduction of gure S1-2). In this case the width of the SSI curve does not decrease as the level of neuronal variability is decreased. This too makes sense. As the noise is multiplicative the signal to noise ratio is proportionally constant across the stimulus set. Furthermore the level of noise needed to facilitate the slope to peak transition is much greater than in the additive case. This is a consequence of the peak receiving more noise than the slope (since σ(θ) scales with τf(θ)). Varying A Frequently additive and multiplicative noise vary simultaneously. To replicate these conditions A is varied. Again the slope to peak transition can be seen to occur at A T r = 2.42 (see gure 4.3). The low noise is plotted for A L = 1, and the high noise for A H = 4. 30

Std. of η 0.2 0.15 0.1 0.05 0 180 90 0 90 180 stimulus value, θ Information, i SSI (θ), (bits) 5 4 3 2 1 0 180 90 0 90 180 stimulus value, θ Figure 4.3: The eect of simultaneously scaling the additive and multiplicative components of the noise model, A, on the SSI. (Left) The standard deviation of the noise model as a function of the stimulus. (Right) the SSI for the neuron as a function of the stimulus. The slope-to-peak transition can be seen as the noise level increases. An example of slopeencoding can be seen for A L = 1 (blue), transitional value A T r = 2.42 (black), and peakencoding A H = 4. Noise parameters: α = 0.024, β = 0.026, and φ = 1. (Reproduction of gure S1-3). Poisson Viability To replicate Poisson variability, the standard deviation of the noise model is dened as σ = f(θ), and so the variance of the noise equals the mean ring rate at that stimulus value. This gives a good approximation to Poisson variability. In gure 4.4 the slope-topeak transition is driven by increasing the ring rate of the tuning curve (increasing f max in equation 3.2). In this case the low noise SSI is calculated for f max = 133 Hz, transitional noise f max = 81 Hz and high-noise f max = 39 Hz. 31

Std. of η 10 5 0 90 45 0 45 90 stimulus value, θ Information, i SSI (θ), (bits) 3 2 1 0 90 45 0 45 90 stimulus value, θ Figure 4.4: The eect of increasing the ring rate, f max, on the SSI, under a Poisson-like noise model. In this case increasing the ring rate increases the level of noise, due to the Poisson mean = variance relationship. (Left) The standard deviation as a function of the stimulus. (Right) The SSI as a function of the stimulus. The slope-to-peak transition can again be seen as the level of noise increases. Examples of low noise slope-encoding (high ring rate) where f max = 133 (red); slope-to-peak transition for f max = 81 (black) and peak-encoding where f max = 39 (blue dashed). Noise parameters: A = 1, α = 0, β = 1 and φ = 0.5. (Reproduction of gure S1-3). Increasing the maximal ring rate of the neuron eectively increases the signal to noise ratio of the response. If the signal to noise ratio is dened to be the ratio of the mean spike count versus the standard deviation of neuronal variability, then this ratio is dened for Poisson-like noise as f(θ) f(θ) = f(θ). So as f max increases and thus also f(θ) the f(θ) σ(θ) = signal to noise ratio will increase too. 4.1.1 Experimentally Characterised Neurons In this section we apply the SSI to a single neuron with experimentally characterised response models, where response model refers to the tuning curve and the noise models. Two response models are used: those found by Kang et al. [17] for orientation tuned neurons in macaque primary visual cortex; and wind-detecting neurons in the cricket cercal system, found by Miller et al. [20]. For the V1 neuron (gure 4.5 A), the neuronal variability has a Poisson-like distribution. The stimulus set is restricted to [ 90 o, 90 o ], the parameters of the neurons tuning curve in the low ring rate case are: f max = 34 Hz, f var = 22.2 o, and the base rate b = 0.16 f max. For the response statistics that were recorded it can be seen that this V1 neuron is in the high-noise peak encoding regime (gure 4.5 B average rate), but if the response rate is increased four-fold the encoding transitions to the high-slope regions of the tuning curve (gure 4.5 A high rate). 32

For the cricket cercal neuron (gure 4.5 B) the level of neuronal variability was found to vary linearly with the tuning curve value. In particular for the low noise case the noise model parameters were A = 1, α = 0.048 and β = 0.052, for the high noise case A = 3 (noise is linear so φ = 1). This neuron's tuning curve has a cosine shape (the function is dened in 3.4, with f max = 1, and a time integration window τ = 1 s). Here too, both slope and peak encoding can be seen as dependant upon the noise level. In the low noise regime the maximum SSI is found at θ = ±34 o, close but not equal to the stimuli with maximal tuning curve gradient (located at the outside edges of the cosine bump). Figure 4.5: The SSI as a function of the stimulus for the two experimentally characterised orientation tuned neurons. (A) The V1 Neuron with a Gaussian shaped tuning curve and Poisson-like variability: (top) the High ring rate case (4 f max ) equivalent to low noise conditions; and (bottom) the average ring case, equivalent to high noise. (B) The cricket wind direction sensitive cercal inter-neurons, with cosine shaped tuning curves and a linear noise model: (top) low noise, and (bottom) high noise. Fine black lines plot the normalised tuning curve ±σ. (Reproduction of gure 2) 33

4.2 SSI for a four Neuron Population In this section the SSI is applied to a small population consisting of four neurons. The SSI is evaluated for the population as a whole, and an estimate of the contribution that one neuron makes to the whole population's SSI is made, to again see if there is a slope-to-peak transition as the level of neuronal variability is increased. In a population of neurons, the information that each individual neuron contributes might not be the same as the information that this neuron would convey in isolation, i.e. information does not scale linearly with the population size, as information can be encoded either redundantly or synergistically between neurons. Thus, the SSI as applied to a single neuron can be interpreted as assigning a primary role to this neuron, so that the role of remaining neurons in the population is to provide missing information not provided by this rst neuron. In this section, we have consider the opposite limit, and calculate the marginal SSI (mssi) (see section for 3.2.3.1 for how the mssi is calculated). Since the mssi is the SSI for the population minus the SSI for the population but one neuron (the neuron to be marginalised), it is a lower bound on the contribution one neuron makes to the population as a whole. 34

Figure 4.6: The eect of changing the level of neuronal variability on the SSI and mssi in the context of the four neuron wind direction sensitive cricket cercal system population [20]. (A) The four neuron's tuning curves (bold the neuron for which the mssi is calculated). (B,C & D) The SSI for the population (ne), and the mssi (bold). The slope-to-peak transition in the mssi as the level of noise is increased can be seen. (B) A = 1, (C) A = 3 and (D) A = 5. Noise parameters: α = 0.048, β = 0.052 and φ = 1.(Reproduction of gure 3). 35

The SSI is applied to a four neuron population as is found in the cricket cercal system and reported by Miller et al. (1991) [20]. The population's tuning curve have cosine shape, and can be seen in gure 4.6 (A) where the neuron for which the marginal SSI is calculated is highlighted in bold. The population SSI (thin line) and marginal SSI (bold line) are plotted in (B) low noise (A = 1), (C) medium noise (A = 3) and (D) high noise (A = 5). Again a slope to peak transition can be seen in the marginal SSI (the transitional level here A T r > 3). As the level of noise is increased, the region of maximal population SSI shifts toward the high slope regions of the population (point 2 in gure 4.6), where there is the largest overlap in the neurons' tuning curves, and thus the eects of noise are reduced by the cooperative encoding of neighbouring neurons. However, the mssi's maximum, as the noise level increases, shifts from high-slope to high-ring rate regions of the tuning curve that is being marginalised. In the low noise case (B of gure 4.6), the mssi is zero at the location of peak ring rate, since information in this region is already accounted for by other neurons in the population. This situation changes as the noise level increases, in the high noise case (D of gure 4.6) the mssi is maximal at the preferred orientation of the marginalised neuron's tuning curve. Thus again a slope-to-peak transition can be seen as the level of neuronal variability increases. The level of noise needed to facilitate this transition, however, is higher in the context of a population of neurons than in the case of a single neuron (5 versus 1.21, in gure 4.3). The increase in the slope-to-peak threshold noise level, as a function of the population size is a consequence of populations of neurons' ability to average out neuronal variability by their joint encoding of stimuli. The SSI in the case of a single neuron, and the mssi in the context of a population, reect two limits on the amount of information a neuron can convey about a stimulus. What is particularly interesting is that both these quantities transition from slope-to-peak. 4.3 SSI Applied to Discrimination Tasks The previous section investigated how well stimuli are encoded by a neuron. In this section, a dierent but related question is investigated how well can two stimuli be discriminated by a particular neuron?. In particular we investigate the ability to perform ne discrimination, discerning ne orientation distances, θ = 6 o, and coarse discrimination, discerning two opposing stimuli θ = 180 o. In behavioural studies, the ability of an animal to perform such tasks can be isolated in the form of two-choice discrimination tasks. In these tasks the ability to discriminate between two dierent stimuli, say s 1 and s 2, depends upon the separation of their two respective response distribution p(r s 1 ) and p(r s 2 ). In coarse discrimination tasks the two 36

stimuli s 1 and s 2 will be distant in the stimulus set, while for ne discrimination tasks the two stimuli will be near, and thus the separation of p(r s 1 ) and p(r s 2 ) will bear some relation to the gradient of the tuning curve at that location. To see how the SSI behaves in this experimental context the discrimination SSI (see section 3.2.3.2 for details of implementation) was applied to a generic Gaussian tuning curve. For the ne discrimination tasks the stimuli were taken to be at θ + 3 o and θ 3 o, and for the coarse discrimination tasks the two stimuli were taken as opposites in the stimulus set, i.e. θ = 0 o and θ = 180 o. The results can be seen in gure 4.7, where the ne discrimination SSI is plotted on the left and coarse discrimination on the right. For the ne discrimination task, high-slope regions always convey the most information, with little variation under increasing noise levels (the 1, 4 and 16 noise cases plotted are roughly equivalent). For the coarse discrimination tasks, it can be seen in gure 4.7 (right) that the high-rate regions of the neurons tuning curve carry the most information. This makes sense, if the two stimuli are diametrically opposite, the distance in the two response distributions p(r s 1 ) and p(r s 2 ) will be greatest when one of those stimuli is at the preferred orientation of the tuning curve (with maximal ring rate) and the other stimulus is eliciting the minimal response. The noise levels are varied as for the ne discrimination task. Again increasing the neuronal variability in this context does not facilitate a slope-to-peak transition, as neurons with their preferred orientation aligned with that of one of the stimuli will always convey the most task-relevant information. Within the horizontal regions of the SSI in gure 4.7 (right) perfect discrimination is occurring. Figure 4.7: The discrimination SSI for ne discrimination tasks (left) and coarse discrimination tasks (right). In both contexts maximal discrimination SSI is independent of the noise level. The tuning curve is normalised to 1 2 (ne black). (Left) Fine discrimination SSI is maximal in high slope regions of the tuning curve. (Right) the coarse discrimination SSI is maximal in the regions for which θ and θ + 180 o have the greatest dierence in mean ring rate. The noise varies A = 1 (red), A=4 (black) and A = 16 (blue). (Reproduction of gure 4.) 37

Applying the SSI to discrimination experiments demonstrates how the experimental context can determine whether the slope or peak of the tuning curve is most informative. In previous applications of the SSI the slope or peak encoding was seen to be dependent upon the level of neuronal variability. This occurs when all stimuli are equally likely, but in the discrimination tasks above the transition does not occur and the best encoding appears to be independent of noise. This highlights an important point, that the experimentalist's prior on the stimulus is relevant in dening the region of `best' encoding. 38

Chapter 5 Results II: Extensions to the SSI The key results of Butts and Goldman [10], that were reproduced in the previous chapter, showed that the stimuli that are best encoded by a neuron's tuning curve depend on a number of dierent factors: the level of neuronal variability, the population size, and the task specicity. However, the neural contexts to which the SSI was applied, and which provide the foundations for these conclusions, was limited to small populations of neurons and a xed time window. This Chapter presents two types of extension to the analysis of the SSI: Firstly, the eect of the time integration window used to construct tuning curves is investigated (section 5.1), secondly, the eect of the population size is investigated (section 5.2). 5.1 Integration Time Window Butts and Goldman chose to examine time windows of 1 second. Here we systematically explore the consequences of time windows varying from 15 ms to 1000 ms. This analysis is applied to three types of tuning curve model: i) a Gaussian tuning curve with linear variability, ii) the experimentally characterised of Miller et al., and iii) a Gaussian tuning curve with Poisson-like variability. E1 5.1.1 Gaussian Tuning Curve with Linear Variability The SSI was applied to a generic Gaussian tuning curve with a ring rate of f max = 80 Hz and a linear noise model, i.e. σ(θ) = A[α + βτf(θ)]. The time integration window τ was varied from τ = 15 to τ = 500 ms (see gure 5.1). 39

Figure 5.1: The slope-to-peak transition as a function of the integration time window τ under a linear noise model. For small τ, the neuron is in the peak-encoding regime. The transition occurs for τ T r = 80 ms. For τ > 80 ms, the neuron functions in the slope regime. Tuning curve properties: f max = 80 Hz, f var = 5 o, b = 5 Hz (ne blue: normalised tuning curve). The noise model is linear: A = 6, α = 0.024, β = 0.026 and φ = 1. The SSI's region of best encoding in this case can be seen to transition from the highrate regions (peak encoding) to the high-slope regions (slope encoding) of the neuron's tuning curve as the time integration window increases. The peak-to-slope transition occurs for τ T r = 80 ms. For τ = 20 ms, the tuning curve is in the peak encoding regime, and for τ = 500 ms the high-slope regions of the tuning curve are most accurately encoded. The eect of increasing the integration time window in this case is qualitatively very similar to the eect of decreasing the level of neuronal variability, as seen in the previous chapter. This is not surprising. The standard deviation of the noise model σ is dened as σ(θ) = A[α + βτf(θ)], so increasing τ eectively scales the multiplicative component of the noise model up too, while the additive component, α, stays the same. This relationship is more clearly understood by evoking a naive but intuitive measure of encoding accuracy, the signal to noise ratio (SNR). The SNR can be dened as the ratio of the mean response value and the standard deviation of the neuronal variability, i.e. SNR(θ) = τf(θ) τf(θ) f(θ) σ(θ). As τ 0, the SNR(θ) = A[α+βτf(θ)] = A[ α τ +βf(θ)] 0 and thus the level of noise in the system is much higher for smaller time windows. Conversely, if the opposite limit is considered, as τ then the SNR(θ) 1 Aβ an upper-limit of the SNR. For purely additive noise (i.e. β = 0), the the SNR would be directly proportional to the time integration window τ. Thus in the the linear noise case, the eective level of noise in the system is inversely proportional to the time integration window. 40

5.1.2 Experimentally Characterised Neuron with Linear Variability Here the SSI is applied to the cricket cercal neuron, using parameters reported by Miller et al [20] (see gure 5.2). The peak-to-slope transition can be seen to occur at τ T r = 130 ms. Figure 5.2: The slope-to-peak transition as a function of the integration time window τ for the cricket cercal inter-neuron (characterised by Miller et al. [Miller et al.]).for small τ the neuron is in the peak encoding regime. The transition occurs for τ T r = 130 ms. For τ > 130 ms, the neuron functions in the slope-encoding regime. Tuning curve properties: f max = 84 Hz, baserate = 0 Hz (ne black half normalised). Noise model is linear: A = 1, α = 1.1, β = 0.058, and φ = 1. 5.1.3 Gaussian Tuning Curve with Poisson Variability In gure 5.3, the SSI is plotted as τ is varied from 10 to 1000 ms, for a Fano factor F = 1 and a circular normal tuning curve. The peak-to-slope transition can be seen as τ increases. However, the time integration window at which this transition occurs, τ T r = 750 ms, is larger than in the linear case above (where τ T r = 80 ms see gure 5.1). To continue the signal to noise ratio analyses that was applied in the linear noise case, here we see that as τ 0, then the SNR(θ) = τf(θ) βτ = slower than in the linear noise case. τf(θ) β 0 but this convergence is 41

4 4 3.5 3 20 ms 100 ms 750 ms 1000 ms i SSI (θ) (bits) 3 2 1 i SSI (θ) (bits) 2.5 2 1.5 0 180 90 0 stimulus, θ 90 180 0.1 0.4 0.3 0.2 time, τ, (s) 0.5 1 0.5 0 180 90 0 90 180 stimulus, θ Figure 5.3: The slope-to-peak transition as a function of the integration time window τ under Poisson-like variability (F = 1). τ T r = 750 ms. Tuning curve properties: f max = 80 Hz, f var = 5 o, Baseline = 5 Hz. Noise model: A = 1, α = 0, β = 1, and φ = 1 2. To assess quantitatively the parameter values of the slope-to-peak transition, we measured the ratio of the marginal SSI at the peak of the tuning curve and that at the 'maximal slope'. The latter is dened as the region of the TC that corresponds to maximal Fisher information, this quantity is referred to as the SSI Peak over Slope Ratio (abbreviated to SSI PoSR). P osr = marg. SSI at peak marg. SSI at slope (5.1) Figure 5.4: Schema showing the SSI (red) and the rescaled Fisher information (black). The value of the SSI at the location of the maximum Fisher information, and maximal ring rate is displayed, used to calculate the SSI P osr. When, the P osr is less that one, the neuron is said to operate in the high slope regime. 42

When the P osr is greater than one, the neuron is said to operate in the peak regime. The value where P osr = 1 refers to the transitional peak-to-slope value. To further investigate the relationship between the Fano factor and τ T r the Slope over Peak Ratio is plotted for three dierent Fano factors, F = 0.2, 1, and 1.5 (see gure 5.5). In this case τ T r is directly related to F : when the Fano factor is large, τ T r occurs for longer time windows. 3.5 3 F = 0.2 F = 1 F = 1.5 2.5 PoSR SSI 2 1.5 1 0.5 0 0.5 1 1.5 time integration window, τ (s) Figure 5.5: The SSI PoSR as a function of τ, plotted for three dierent Fano factor, F = 0.2, F = 1, and F = 1.5. The smaller the Fano factor the smaller the integration time window at which the slope-to-peak transition occurs.5.1. 5.2 The SSI in Larger Neural Populations This section will investigate how the SSI behaves in larger neural populations, as the level of neuronal variability and the integration time window is varied. First, we ask "can the slope-to-peak transition still be observed in larger populations of neurons with biologically plausible noise models and time integration windows?". Secondly, results relating that of the convergence of the SSI and FI measures convergence, are also presented. To obtain results that are thorough enough for conclusions to be made but remain intelligible for analyses the parameters of variation, i.e. the neuronal variability, the time integration window and the population size, have to be constrained. Here the SSI is applied to: i) larger populations of neurons (from 4 to 50 neurons), ii) three dierent time windows, τ = 15, 100, and 1000 ms, and iii) three dierent Fano factor noise values, F = 0.2, 1, and 1.5. For the remainder of this section, unless otherwise stated the tuning curve function is taken to be circular normal, with a maximal ring rate at its preferred orientation of 43

80 Hz, a baseline activity, b = 5 Hz and circular variance f var = 5 o. All neurons' preferred orientations are uniformly arrayed across the stimulus set. The results are shown in gures 5.6 (τ = 15 ms), 5.7 (τ = 100 ms) and 5.8 (τ = 1000 ms). In these gures the population size is varied along the columns (in ascending order), and the three Fano factors along the rows (in ascending order). Slope-to-peak transitions can be observed as a function of the population size, neuronal variability and the integration time window. These results will be further discussed in two subsections the slope-to-peak transition as a function of i) the neuronal variability (section 5.2.1) and ii) the population size (section 5.2.2). 5.2.1 The Slope-to-Peak Transition as a Function of the Neuronal Variability Level All three gures (5.6, 5.7 and 5.8) show examples of slope-to-peak transitions as the level of neuronal variability is increased (highlighted in A1 and A2). However whether the transition occurs is also dependant upon the population size and the length of the temporal window τ. 44

1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0.25 0.2 0.15 0.1 0.05 0.12 0.1 0.08 0.06 0.04 0.02 0.07 0.06 0.05 0.04 0.03 0.02 0.01 SSI (bits) SSI (bits) SSI (bits) SSI (bits) 0 180 90 0 90 180 stimulus value, θ (deg.) 0 180 90 0 90 180 stimulus value, θ (deg.) 0 180 90 0 90 180 stimulus value, θ (deg.) 0 180 90 0 90 180 stimulus value, θ (deg.) Figure 5.6: The marginal SSI (red) and rescaled FI (black) for a integration time window of 15 ms. The population size is varied from 4 to 50 neurons (cols.). The Fano factor, F = 0.2, 1, and 1.5 (rows). Peak-to-slope transitions occur as function of population size and noise level. For the low noise case, F = 0.2, NT r = 8 neurons. For Poisson-like variability, where F = 1, NT r 20 neurons. For the high noise case, where F = 1.5, 20 < NT r < 30 neurons. Shaded plots are enlarged below, selection A1 is enlarged in gure 5.5. 45

1.6 1.4 1.2 0.5 0.4 0.25 0.2 0.07 0.06 1 0.8 0.6 0.3 0.2 0.15 0.1 0.05 0.04 0.03 SSI (bits) SSI (bits) SSI (bits) SSI (bits) 0.4 0.1 0.05 0.2 0.02 0.01 0 180 90 0 90 180 stimulus value, θ (deg.) 0 180 90 0 90 180 stimulus value, θ (deg.) 0 180 90 0 90 180 stimulus value, θ (deg.) 0 180 90 0 90 180 stimulus value, θ (deg.) Figure 5.7: The marginal SSI (red) and rescaled FI (black) for τ = 100 ms. Fano varies along the rows, and the population size along the columns of the arrayed plots. Peak-to-slope transitions can be observed as a function of population size and noise level. For low noise, F = 0.2, even populations of four neurons are in the slope regime. For Poisson-like variability, F = 1, NT r 8 neurons. For the high noise case, F = 1.5, 10 < NT r < 15. Shaded plots are enlarged below, selection A2 is enlarged in gure 5.5. 46

1.4 0.8 1.2 0.8 0.8 0.7 1 0.8 0.6 0.4 0.2 0.6 0.4 0.2 0.6 0.4 0.2 0.6 0.5 0.4 0.3 0.2 0.1 SSI (bits) SSI (bits) SSI (bits) SSI (bits) 0 180 90 0 90 180 stimulus value, θ (deg.) 0 180 90 0 90 180 stimulus value, θ (deg.) 0 180 90 0 90 180 stimulus value, θ (deg.) 0 180 90 0 90 180 stimulus value, θ (deg.) Figure 5.8: The marginal SSI (red) and rescaled FI (black) for τ = 1000 ms. The Fano factor varies along the rows and the population size along the columns. The temporal window in this case is suciently large to reduce the eective level of neuronal variability such that all neurons lie in the slope encoding regime. Shaded plots are enlarged below, selection A3 is enlarged in gure 5.5. 47