A Method of Reducing Model Space for Dynamic Causal Modelling

Size: px

Start display at page:

Download "A Method of Reducing Model Space for Dynamic Causal Modelling"

Jonah Higgins
5 years ago
Views:

1 A Method of Reducing Model Space for Dynamic Causal Modelling Joseph Whittaker School of Medicine A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Medical and Human Sciences. 2013

2 Table of Contents List of figures... 5 Abstract... 8 Declaration... 9 COPYRIGHT STATEMENT Acknowledgments Abbreviations Introduction General overview and motivations Structure of the thesis Background Main findings Principles of fmri Nuclear Magnetic Resonance Spin angular momentum External magnetic field Excitation and relaxation Echoes Forming an image Echo-planar imaging Image contrast Functional Magnetic Resonance Imaging Neurovascular Coupling Blood-oxygen-level-dependent contrast Brain connectivity Functional Specialisation and Integration Structural connectivity Functional and effective connectivity Seed-Voxel Correlation Maps Matrix decomposition based methods Psychophysiological interactions Structural Equation Modelling Multivariate autoregressive modelling

3 3.3.6 Dynamic Causal Modelling Dynamic Causal Modelling Introduction Neuronal state equations Bilinear model Non-linear model Two-state model Stochastic model Haemodynamic model Parameter estimation Model priors Inference Bayesian Model Selection (BMS) Model space Inference on parameter space Neuroimaging in Psychiatry Introduction Connectivity Depression Emotional processing bias Emotional face processing in depression Effects of antidepressants and fmri Summary Paper Abstract Introduction Methods Subjects N-back task Implicit emotional face processing task Image analysis Dynamic Causal Modelling Results

4 6.4.1 fmri group activations N-back task Implicit emotional face processing task Discussion Supplementary material Paper Abstract Introduction Methods Subjects Task Dynamic Causal Modelling Results fmri group activations BMS results Free energy correlation results BMA results Discussion Supplementary material Paper Abstract Introduction DCM Methods Subjects Antidepressant treatment Implicit emotional processing faces task fmri data acquisition DCM analysis Results BMS BMA Effect of Citalopram treatment

5 8.5 Discussion Discussion Summary of main findings Paper Paper Paper Implications of work Two-node method Inference on network structure Inference on parameter space Inference on larger networks Limitations and future directions Simulation study Post-hoc BMS Modulatory parameters Conclusion References Word Count: 45,582 4

6 List of figures Chapter 2 Figure 2.1: Orientation of spins in the external magnetic field B Figure 2.2: Application of the RF pulse Figure 2.3: Fomation of an MR image from k-space Figure 2.4: SE pulse sequence Figure 2.5: EPI pulse sequence Figure 2.6: Effec of time constants on image weighting Figure 2.7: Physiological variables that result in the BOLD response Figure 2.8: BOLD measured haemodynamic response Chapter 4 Figure 4.1: Schematic of bilinear DCM system Figure 4.2: Schematic of non-linear DCM system Figure 4.3: Schematic of two-state DCM system Figure 4.4: Schematic of the haemodynamic model used in DCM Figure 4.5: Number of possible models for a given number of nodes Chapter 5 Figure 5.1: The affective go/no go task and the emotional stroop task Figure 5.2: Schematic of brain regions involved in face processing [178].. 78 Figure 5.3: The most likely models in HC and rmdd groups during emotional face processing found by Goulden et al [95]

7 Chapter 6 Figure 6.1: Formation of the two-node model space Figure 6.2: Schematic of n-back task Figure 6.3: Schematic of implicit emotional face processing task Figure 6.4: Formation of intrinsic connectivity families Figure 6.5: BMS results for n-back and implicit emotional face processing task Figure 6.6: Implicit emotional face pocessing BMS results after splitting subjects Figure 6.7: Models inferred fom th two-node and three-node approaches in the implicit emotional face processing task Chapter 7 Figure 7.1: Formation of the two-node model space from the whole netwok three-node model space Figure 7.2: The two- vs three-node input family group RFX BMS results. 121 Figure 7.3: Correlation of free energies across session and centre in the three-node approach Figure 7.4: Correlation of free energies across session and cente in the twonode appoach Figure 7.5: Input parameter estimates for the two- and three-node approaches Figure 7.6: Correlation of connectivity parameter estimates and standard deviations between the two- and three-node approaches

8 Figure 7.7: The two- vs three-node intrinsic family group RFX BMS results Figure 7.8: Intrinsic connectivity parameter estimates for the two- and threenode approaches Chapter 8 Figure 8.1: Schematic of the two- and three-node model spaces and their respective sizes Figure 8.2: Input family and intrinsic connectivity family RFX BMS results Figure 8.3: Parameter values for significantly different modulations of intrinsic connections by happy and sad emotions between MDD and HC. 150 Figure 8.4: Models structure as determined by BMS and modulation by facial emotion as determined by BMA parameter values for MDD and HC group Chapter 9 Figure 9.1: Number of models for a given nmber of nodes in the two-node method

9 Abstract The University of Manchester Joseph Whittaker Doctor of Philosophy (PhD) Thesis title: A method of Reducing Model Space for Dynamic Causal Modelling September 2013 An increasingly important concept in psychiatric neuroimaging is that of brain connectivity. Dynamic Causal Modelling (DCM) has been successfully used to infer how spatially remote areas of the brain integrate to form functional networks. A potential disadvantage to DCM is the need to predefine a model based on a hypothesis about the underlying connectivity. This requirements means the results are dependent on the assumptions about model structure, and important features of the underlying network may be ignored. Here we present a method for identifying the model structure in a way that discards the a priori knowledge that is typically used to constrain model space. This allows DCM to be used in a more data-driven way, and allows the optimal model within a network of nodes to be identified. The thesis consists of 3 studies that together provide a generic framework for a novel approach to DCM and validation that it works, and offers a significant computational advantage to traditional DCM. The first study demonstrates that the connectivity within a system of brain regions can be ascertained from inferring the connectivity within smaller systems, which consist of regions taken from the entire system. By analysing the data in this fashion, we can effectively explore the entire network structure space, but estimate a much smaller number of models than would be typical. The second study applies the method to a multicentre dataset and shows that Bayesian Model Selection (BMS) results are reproducible at different centres and across different sessions. The findings show that DCM is robust enough to be used in multicentre studies and that our exploratory approach is just as effective as traditional approaches to DCM. The third study applies the method to a standard psychiatric imaging dataset; an implicit emotional processing face recognition task performed by patients with major depressive disorder (MDD) vs healthy controls (HC). The MDD patients perform a follow up scan having being treated with the antidepressant citalopram. The study shows that the developed method can be used to identify the optimal model structure in order to make inferences on effective connectivity parameters, and identify differences between patient and control groups, and before and after treatment. 8

10 Declaration No portion of work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or institute of learning. 9

11 COPYRIGHT STATEMENT The following four notes on copyright and the ownership of intellectual property rights must be included as written below: i. The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the Copyright ) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes. ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made. iii. The ownership of certain Copyright, patents, designs, trade marks and other intellectual property (the Intellectual Property ) and any reproductions of copyright works in the thesis, for example graphs and tables ( Reproductions ), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and commercialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see in any relevant Thesis restriction declarations deposited in the University Library, The University Library s regulations (see and in The University s policy on Presentation of Theses 10

12 Acknowledgments First and foremost, I would like to thank my supervisors Dr. Rebecca Elliott and Dr. Shane McKie, and my advisor Prof. Steve Williams for all their support, advice and guidance during my PhD. I owe a big thank you to everyone in the NPU for their kindness and encouragement, and particularly Martyn McFarquhar for his invaluable knowledge of statistics. I would also like to thank my family and Emma for their love and support throughout. Finally, I would like to thanks the BII and MRC for their role in funding my studentship. 11

13 Abbreviations 5-HT - 5-hydroxytryptamine (serotonin) ACC - anterior cingulate cortex AIC - Akaike information criterion AMY - amygdala ARMS - at risk mental state ATD - acute tryptophan depletion BD - bipolar disorder BIC - Bayesian information criterion BMA - Bayesian model averaging BMS - Bayesian model selection CBF - cerebral blood flow CBV - cerebral blood volume BOLD - blood-oxygen-level-dependent DCM - Dynamic Causal Modelling dhb - deoxyhaemaglobin DPC - dorsolateral prefrontal cortex DSM - Diagnostic and Statistical Manual of Mental Disorders DTI - diffusion tensor imaging EEG - electroencephalography EPI - echo planar imaging FE - first episode FFX - fixed effects FG - fusiform gyrus GE - gradient echo GLM - general linear model 12

14 Hb oxyhaemaglobin HC - healthy control HDR - haemodynamic response ICA - independent components analysis IOG - inferior occipital cortex KL - Kullback-Leibler divergence MADRS - Montgomery Åsberg Depression Rating Scale MDD - Major Depressive Disorder OFC - orbitofrontal cortex PCA - principle components analysis PET - positron emission tomography PFC - prefrontal cortex PMC - premotor cortex PPC - posterior parietal cortex PPI - psychophysiological interacion rmdd - remitted depressed RF - radio frequency RFX - random effects ROI - region of interest SEM - structural equation modelling SMA - supplementary motor area SNR - signal-to-noise ratio SE - spin echo SSRI - selective serotonin re-uptake inhibitors TE - time to echo TR - time to repeat VB - Variational Bayesian 13

15 1 Introduction 1.1 General overview and motivations The ability to image the functioning human brain is one of the most remarkable scientific advancements of the last few decades. Functional magnetic resonance imaging (MRI) has been widely adopted as it provides a non-invasive way of visualising brain activation in conscious humans during cognition. As such it has proved to be an invaluable tool in the mapping of specific cognitive functions to specialised areas of the brain [1] and has had a significant impact on our understanding of the neural basis of psychiatric disorders [2]. Despite neuroimaging s role in progressing our understanding of the neurobiological basis of psychiatric illness, it has had little bearing on the way patients are diagnosed or treated [3]. However, a shifting emphasis towards understanding mental illness in terms of abnormalities in distributed networks of different brain areas, as opposed to single regions, is showing great promise [4]. Brain connectivity methods for the analysis of fmri data broadly fall into two categories [5]; data-driven methods and model driven methods. The use of a model of how data are generated in analysis methods like Dynamic Causal Modelling (DCM) [6], allows for the explicit inference of causal interactions between brain regions, which allows one to draw more detailed conclusions about network structure [5-8]. Conversely, model free methods can infer the presence of connections between brain regions, but without knowledge of the direction of causation. As a result less information can be obtained about functional brain networks, but model free methods have the advantage of being data-driven [5, 7, 9]. Data-driven (or exploratory) methods make no a priori assumptions about how data are generated, in contrast with modeldriven methods in which models are postulated with the purpose of testing a hypothesis. 14

16 Both data-driven and model based methods have seen increasing popularity in the neuroimaging literature, in line with a general shift towards connectivity based studies. In model based approaches like DCM, the total number of network structure variations forms the complete model space. As model estimation is computationally expensive, an exhaustive model search is intractable for all but the simplest systems of nodes. As a comparison of all models is not possible, model based methods usually rely on carefully motivated constraints being placed on the model space. In the case of DCM these constraints represent hypothetical a priori knowledge about network structure. The work presented in this thesis aims to present a method in which all a priori restrictions to model space are discarded. The specific hypothesis being tested can be stated as follows; Information about DCM network structure can be inferred without the need to estimate all possible models, thus implying a degree of redundancy present in the complete model space. In other words, is it possible to test all conceivable network structures without the need to exhaustively search the complete model space? If this hypothesis is correct, then it allows DCM to be used in a more exploratory way, when strong a priori information about network structure is not available. It is hoped that this would provide a tool for inferring network structure in patient and control groups with the purpose of identifying connectivity based abnormalities that characterise psychiatric illness. 1.2 Structure of the thesis The work presented in this thesis describes the development of a novel approach to network inference in DCM. It has been presented in the alternative format whereby the main findings have been presented as separate papers that are suitable for submission to peer reviewed journals. It is therefore separated into a section covering the background theory and then the main findings presented as separate papers. The author of this 15

17 thesis is the sole author of all the main findings, which are presented in the form of journal submissions. Named authors in chapters 6, 7, and 8 were involved only in data collection, and all the data analysis was carried out by the author of the thesis Background Chapter 2 provides the theoretical background to functional magnetic resonance imaging based on the underlying principles of nuclear magnetic resonance, the localised change in blood flow that accompanies neural activity, and the corresponding signal that is measured. Chapter 3 outlines the principles of brain connectivity and reviews the methods used to measure it. Chapter 4 gives a comprehensive overview of the theory behind Dynamic Causal Modelling (DCM) as it is the technique exclusively used in this thesis. Chapter 5 is a brief literature review of the neuroimaging in psychiatry with a narrow focus on the themes relevant to the work presented in the papers Main findings At the time of writing, all the main findings are in preparation for submission to peer reviewed journals in the format that they have been presented here. Paper 1 proposes a new method for DCM that allows the entire model space to be explored in very efficient manner that makes it computationally feasible. Empirical validation for the method is sought using two different paradigms. 16

18 Paper 2 expands upon the work in paper 1 by testing the method with another dataset that is part of a multicentre study with the aim of assessing the reproducibility the method and DCM across neuroimaging centres. Paper 3 provides a proof of concept for the method by demonstrating how it can be used to infer connectivity based abnormalities in a group of depressed patients. 17

19 2 Principles of fmri This chapter outlines the principles of functional magnetic resonance imaging, and nuclear magnetic resonance, on which it is based. The following texts were used as reference material [10-13]. 2.1 Nuclear Magnetic Resonance Magnetic resonance imaging (MRI) is a medical imaging technique that relies on the principle of nuclear magnetic resonance (NMR) to visualise the internal structure of the human body. The signal is primarily due to the hydrogen nuclei in water, which comprises 50-70% of an adult s total body weight. Somewhat confusingly the term spin can refer to the spin angular momentum, a property of nuclei and their constituent particles, but also, in MRI literature, as the vector of a nuclei s magnetic field. As the latter is proportional to the former it is a trivial point, but the reader should be aware to avoid confusion Spin angular momentum The fundamental concept behind NMR is a physical quantum mechanical property of atomic nuclei known as spin angular momentum, which is dependent on the number of protons and neutrons in the atom. To exhibit the phenomenon of NMR, a nucleus must have a spin quantum number greater than zero. The MRI signal comes from the hydrogen nucleus ( 1 H), which as a single proton has a spin quantum number of ½. The spin angular momentum 18

20 p of an atomic nucleus is given by equation 2.1, where ħ is the reduced Planck constant, i.e. h/2π.; p [ ss ( 1)] 1/2 Equation 2.1 As it is a vector, p has both magnitude and direction, as does the angular momentum of a rotating body in classical physics. However, in a quantum mechanical system, the angular momentum p is said to be quantised, meaning it cannot vary continuously, and can only exist in a number of discrete states. The angular momentum along the z direction p z can have values determined by equation 2.2; pz ms ms s,( s 1),( s 2)... s Equation 2.2 Thus for protons which have a spin quantum number ½, there are two possible energy states as m s =±½ External magnetic field Quantum spin is an intrinsic property of angular momentum possessed by nuclei, and has a discrete value which is given by the spin quantum number S. It is often described as though particles with spin angular momentum are literally spinning on their axis, which is not the case, although it is a useful way of visualising the phenomenon. In classical physics, Ampere s Law states that rotation of a positive charge will result in magnetic field along the 19

21 axis of rotation, and in the same way, particles with spin angular momentum have a magnetic moment. The magnetic moment μ of the nuclei is given in equation 2.3, with the gyromagnetic ratio γ being a property of nuclei, and in the case of protons equal to 2.7x10 8 rad s -1 T -1. P Equation 2.3 For a system of protons with spins, when an external magnetic field B 0 is applied the nuclei experience a torque, which causes them to precess around the direction of the field as they try and align themselves with it. The protons precess with a frequency proportional to the external magnetic field strength, known as the Larmor frequency and given in equation 2.4, where ω 0 is the angular frequency of precession. B 0 0 Equation 2.4 As the magnetic moment is proportional to the angular momentum it too can only exist in two discrete states relative to the direction of the external field. Under normal conditions in the absence of an external magnetic field, although the magnitudes of magnetic moments in a system are equal, their orientations are random, thus they cancel one another out and there is no resultant net magnetism. However when a large external magnetic field B 0 is applied, the spins can either align themselves parallel to the field (spin-up direction) or in the opposite direction, antiparallel to the field (spin-down 20

22 direction). The majority of spins align themselves in the lower energy parallel direction, but due to thermal energy some spins move to the higher energy antiparallel direction, as shown in figure 2.1. The convention in MRI literature is to define an (x, y, z) coordinate system with the z axis being aligned with the external magnetic field B 0, thus making the (x, y) plane (transverse plane) perpendicular to the external field. As the majority of spins are in the parallel direction, this creates a net magnetization vector M 0. Figure 2.1: A diagram showing how the majority of spins are in the lower energy state, with net magnetization, aligned with the external magnetic field. The ratio of spins in the parallel to antiparallel direction can be determined by the Boltzmann equation (equation 2.5), Where N p and N a are the spins in the parallel and antiparallel directions respectively, k is the Boltzmann constant, T is the temperature in Kelvin, and ΔE= ħγb 0. 21

23 N N p a E kt E e 1 kt Equation 2.5 As evident in equation 5, if the temperature could be brought down to absolute zero, all the protons would occupy the lower energy state parallel to the external field Excitation and relaxation It is the precession of the nuclear spins within different tissue types, and its dependence on the tissue type, that forms the basis of the MR image contrast. An MR image is composed of voxels which are the 3D equivalent of pixels in a 2D image. As previously explained, when the external magnetic field is applied there will be an excess of spins in the parallel direction, i.e. more spins in the positive z-direction than the negative z-direction. There are therefore N p -N a spins which sum in the z direction, but cancel each other out in the transverse plane, thus resulting in the net magnetisation M 0 in each voxel, which lies on the z-axis. The magnitude of M 0 is given by equation 2.6, where PD is the proton density and V voxel is the volume of a voxel. ; ( PD) Vvoxel s( s 1) kt Equation 2.6 The system of spins can then be excited by applying an alternating magnetic field tuned to the Larmor frequency. This alternating magnetic field oscillates in the radio frequency band, and so is called a radio frequency (RF) pulse 22

24 and is denoted by B 1. As B 1 has a frequency equal to the frequency of spin precession, i.e. the Larmor frequency, an efficient energy transfer takes place, a phenomenon known as resonance. This causes the spins to move from their preferred low energy states into a new high energy state. Typically the strength and duration of the first pulse applied in a complicated pulse sequence, known as a 90 RF pulse, causes the spins to change their orientation by 90 leaving them with no longitudinal magnetisation M z, but giving them a new transverse magnetisation M xy as shown in figure 2.2a. Once the RF pulse is turned off, the spins return to their previous low energy orientations aligned with the external magnetic field at a rate described by equation 2.7, where T 1 is a time constant known as the spin-lattice relaxation time. M z M 0 t 1 exp T 1 Equation 2.7 As well as the longitudinal magnetisation M z recovering to its previous value M 0, the cessation of the RF pulse results in the decay of the transverse magnetisation M xy. However, M xy decays at a much faster rate than the rate at which M z returns to M 0. This is due to spin-dephasing, which can be visualised as the spins fanning out, as depicted in figure 2b. M xy decays at a rate described by equation 2.8, where T 2 is a time constant known as the spin-spin relaxation time. This is shown in figure

25 M xy t Mxy0 exp T 2 Equation 2.8 Figure 2.2: a) The application of the RF pulse B 1 flips the spins into the transvers plane, resulting in a net transverse magnetization M xy. b) Once the RF pulse is turned off the spins dephase and return to previous orientation. This results in a net magnetization in the z direction M Z which slowly returns to M 0, and the net transverse magnetization decays to zero. As described above there are three values, the T 1 and T 2 relaxation times and the proton density, which affect the contrast of the image. As the spins relax into their preferred orientations they release their energy into the surrounding lattice of atoms, which is why T 1 is called the spin-lattice relaxation time. The rate at which the energy can be released into the lattice depends on the material, and this is how different body tissues have different T 1 values. The same principle applies to the T 2 relaxation time in that the way the different spins interact is dependent on the material. As M 0 is directly dependent on the proton density, this value is also a determining factor in the image contrast. Also in practice B 0 is never completely uniform, and the inhomogeneity cause neighbouring protons to precess at different 24

26 frequencies and thus move out of phase with one another. A time constant T 2 * determines the rate at which phase coherence is lost, and is given by equation B0 T T 2 * 2 2 Equation 2.9 As M 0 is directly dependent on the external magnetic field B 0, the strength of the magnet in the MR scanner also determines the image contrast Echoes Free-induction decay (FID) is the signal that can be measured with the receiver coil immediately after the RF pulse is applied. In practice though the spins are bought back into phase, and it is an echo signal that is measured. There are two types of echo that are prominently used in MRI, spin-echo (SE) and gradient echo (GE). A basic SE sequence proceeds as follows; 1. An initial 90 RF pulse flips the magnetization in to the transverse plane, and the spins immediately begin to de-phase. 2. A second 180 pulse is applied at time TE/2 which flips the spins around the y axis by 180, which reverses their de-phasing angles. 3. As the spins will continue to precess at the same frequency, they will re-phase at time TE. Therefore the time between the initial 90 RF and the time at which the transverse magnetization is re-phased, and the echo signal is generated, is known as the Time to Echo TE. GE sequences differ from SE sequences in that the spins are not brought back into phase by another RF pulse, but by a gradient pulse. Both SE and 25

27 GE are capable of producing T 1, T 2, and PD weighted images. SE sequences take much longer, but generally produce better quality images Forming an image In an MRI scanner the receiver coil detects signal from the entire object inside it. This means the signal needs to be localised to ensure that at any one time, the signal being measured originates from a specific point in space. This is achieved using magnetic field gradients. In the context of MRI, a magnetic field gradient is simply a linear spatial variation of the magnitude of B 0 in the x, y, or z direction Slice selection The first step in forming an image is known as slice selection, and involves selecting an image slice in the x,y plane by applying a gradient in the z direction, thus making the spins precess at different Larmor frequencies depending on their z coordinate. Only the slice that receives the resonant frequency will be excited, and thus the signal will be localised to that slice. This is known as selective excitation, and the thickness of the slice can be determined by the bandwidth of the RF pulse and the magnitude of the gradient K-space MR images are acquired through the selective use of gradients to form an image in k-space. The MR signal for each slice is a 2D image in space, and Fourier theory states that any signal can be expressed as a summation of sinusoids, which in the case of MRI are 2D sinusoidal brightness variations 26

In K-space, which has the same dimensions as the image, each of the gratings can be represented, with spatial frequencies determined by the

A Fourier transform can then be used to move from k-space into image space, as shown in figure 2.3. k y k x Fourier Transform Figure 2.

28 known as gratings. Gratings are fully described with three parameters; magnitude, spatial frequency in the x direction k x, and spatial frequency in the y direction k y. In K-space, which has the same dimensions as the image, each of the gratings can be represented, with spatial frequencies determined by the coordinates and the magnitude by the pixel intensity. A Fourier transform can then be used to move from k-space into image space, as shown in figure 2.3. k y k x Fourier Transform Figure 2.3: Illustration. Showing how each pixel in k-space represents a grating, and that the image can be represented by all the gratings in k-space Spatial encoding Selective excitation is used to localise the signal to a specific slice. The selected slice can be further broken down into pixels, with each pixel having its own net magnetisation. Generating an image from this selected slice 27

29 involves applying gradients in the x and y direction in a sequential manner to move through the whole x,y plane. The timing of these gradients is known as a pulse sequence and it is usually visualised using a pulse sequence diagram. A typical pulse sequence, as show in figure 2.4, proceeds as follows; 1. An RF pulse is applied at the same time as a slice-selection gradient G SS is applied. This will selectively excite spins within a slice. 2. A phase-encoding gradient G PE is applied in one of the orthogonal directions to the slice-selection gradient. 3. A frequency-encoding gradient G FE is applied in the direction orthogonal to the previous two gradients. This is also known as the readout gradient as it is during its application that the signal is acquired, and a line of k-space is sampled. 4. The sequence is repeated with only G PE changing, and in this why each line of k-space can be read. Figure 2.4: Depiction a simple SE. pulse sequence. Each repetition (TR) of the above sequence encodes a line of k-space. 28

30 2.1.6 Echo-planar imaging An important advancement in the field of MR was Echo-planar imaging (EPI). Compared with the techniques previously discussed, in which multiple RF pulse ( shots ) are required to read each line of k-space, EPI allows all of k- space to be sampled with one shot. The result is a vast reduction of the time in which a slice can be sampled. As such movement artefacts are significantly reduced and functional neuroimaging, in which whole brain volumes are collected in a few seconds, becomes feasible. In conventional pulse sequences each line of k-space is acquired separately with its own RF excitation. In EPI the whole plane is acquired with one RF pulse by rapidly oscillating the G FE causing a series of gradient echoes, with each oscillation corresponding to the readout of one line of k-space. At the same time a train of blips in the G PE direction uniquely phase encodes each of the echoes, effectively moving up a line in k-space. This principle is illustrated in figure 2.5. Figure 2.5: Depiction of how an EPI pulse sequence moves through k-space. 29

31 2.1.7 Image contrast The MR image contrast is largely determined by how it is acquired, i.e. the parameters of the pulse sequence, and is dependent on PD, and the different rate constants; T 1, T 2 and T 2 *. In PD weighted images the signal intensity is proportional to the number of protons in the voxel. In T 2 weighted images the signal intensity is proportional to the T 2 value, i.e. tissues with long T 2 values appear brightest. In T 1 weighted images the signal intensity is inversely proportional to the T 1 value, i.e. tissues with long T 1 values appear dimmest. TR and TE are selected to determine the relative contributions of T 1 and T 2 values to the image contrast as shown in figure 2.6. Figure 2.6: Depiction of how TR and TE times can be chosen to maximize the influence of either the T 1 or T 2 time constants, and thus create images weighted by them T 1 weighted images The receiver coil can only measure the transverse magnetization M xy. Given that the spin-lattice relaxation time T 1 describes the rate of recovery of the longitudinal magnetization M y, it might seem counterintuitive that images can be T 1 weighted. However, as explained, an MR image is formed via multiple RF pulses, and so subsequent transverse magnetizations depend on the 30

32 amount of longitudinal magnetization that has recovered. For example, the longitudinal magnetization in a tissue with a long T 1 value will not have recovered as much as that of a tissue with a short T 1, and therefore the magnetization available to be flipped into the transverse plane will be lower. T 1 weighted images provide good separation between different tissue types and so are most commonly used for structural (anatomical) scans T 2 * weighted images Transverse magnetization relaxation is dependent on the spin-spin relaxation T 2 and the spin dephasing caused by inhomogeneities in the magnetic field. To obtain T 2 weighted images a longer TE value is used than for T 1 weighted images. If the TE value is too short the spins won t have de-phased enough for there to be a notable contrast between tissues, and if it is too long, the signal will have decayed too much. The TR must also be much longer than for T 1 weighted images to allow maximum recovery of the longitudinal magnetization, and minimise the effect of T 1. Similar pulse sequences with long TR times and intermediate TE times are used to generate T * 2 weighted images. The basis of the signal measured in functional MRI (fmri) is * inhomogeneities related to the level of blood oxygenation, therefore T 2 weighted images are preferred. 2.2 Functional Magnetic Resonance Imaging The ability to acquire whole brain volumes in a matter of seconds, made possible by developments to MRI like EPI, has made it feasible to image the functioning human brain. This has led to the development of functional MRI (fmri), in which highly localised neural activity dependent changes in blood flow can be measured. Providing excellent spatial resolution, it has become one of the most widely utilised methods for inferring brain activity in humans [1] both at rest and whilst cognitive systems are engaged during the 31

33 performance of tasks. The measured signal is known as the blood-oxygenlevel-dependent (BOLD) signal, and it is an indirect approximation to neural activity. It is based on the principle of neurovascular coupling Neurovascular Coupling Neurovascular coupling is the name given to the relationship between neural activity and its effect on localised cerebral blood flow (CBF). It is an important concept in functional neuroimaging as it is the basis of the measured BOLD signal in fmri. The basic principle is that when an area of the brain has an increased level of activity, in order for its increased glucose metabolism to be met, there is a localised increased blood flow known as the haemodynamic response (HDR). Despite the fact that knowledge of its existence predates the advent of fmri by almost a hundred years[14], it is still a contentious issue, and exactly how local neural activity mediates blood flow is the focus of debate in the literature [15]. One of the primary questions of the debate concerns the mechanism via which the haemodynamic response occurs. It was initially thought that the energy demands of the neural tissue i.e. their increased need of oxygen for glucose metabolism, is directly responsible for the increase in local blood flow that accompanies neural activity [16]. Despite once being a widely held assumption, this interpretation was eventually superseded once PET data conclusively demonstrated that functionally mediated increases in blood flow are not complemented by levels of oxygen consumption with the same order of magnitude [17, 18]. It can now be stated that local cerebral blood flow is induced by vasoactive substances in multicellular signalling pathways, in a way that is independent of neuronal energy demand. Research has also focussed on determining exactly what aspect of neural activity best corresponds to the BOLD signal, which itself is a measure of the haemodynamic response. The evidence needed to answer this question invariably comes from animal studies, by virtue of the fact it is impossible to 32

34 obtain a direct measure of human brain activity outside of a clinical context. In animal studies it is possible to combine measures of CBF with electrophysiological measures. Indeed this was the method employed by Logothetis et al [19] in a ground-breaking study, in which they demonstrated that in anaesthetised macaque visual cortex, the BOLD response can be better predicted by the local field potential (LFP) than by the multi-unit responses (MUA). Since then other studies have verified that the BOLD signal is a representation of synaptic input rather than spiking neurons, in awake monkeys [20] and in humans [21, 22]. However exactly what the BOLD represents is still far from being satisfactorily resolved, with the relative contributions of feedforward and feedback processes, and discrimination between excitatory and inhibitory responses, being challenges that remain to be tackled [23, 24]. Whilst the exact relationship between the BOLD signal and the underlying neural activity that precedes it is unknown, it can be confidently stated that fmri does provide an indirect measure of brain neural activity Blood-oxygen-level-dependent contrast As already stated, MRI relies on the paramagnetism of hydrogen atoms. However, any paramagnetic substance found naturally in the body, or added as a contrast agent, can be used to form a measurable signal. Neurovascular-coupling facilitates the indirect measure of brain activity; all that is required is a paramagnetic substance in the blood. Initial attempts at imaging the functioning brain relied on the administration of contrast agents. A pioneering study by Belliveau et al [25] demonstrated that contrast agents could successfully be used to image the functioning human brain, alongside detailed anatomical images, thus establishing MRI as an important tool for cognitive neuroscience. The use of an exogenous contrast agent was a serious limiting factor, thus efforts were focussed on identifying naturally occurring contrast in vivo. In 33

35 1990 Ogawa et al [26] published a paper on the potential for using deoxyheamaglobin (dhb) as contrast agent for MRI, dhb being a substance long known to be paramagnetic [27]. Using this technique the measured signal is dependent on the level of blood oxygenation and so is known as the blood-oxygenation level-dependent (BOLD) signal. In 1992, Kwong et al [28] then published the first successful use of the BOLD signal in studying human brain function. Since then it has gone on to dominate functional brain imaging due to its non-invasiveness, non-exposure to radiation and relatively low cost and wide availability. A susceptibility difference between the blood vessel and the surrounding tissue is created by dhb, meaning protons experience slightly different field strengths and thus precess at a different frequency. Consequently the transverse magnetization decays at a faster rate, i.e. a shorter T 2 *, meaning the presence of dhb causes a decrease in the MR signal. The BOLD contrast is dependent on the concentration of dhb, which itself is determined by the balance between oxygen dependence and oxygen supply. Intuitively, considering that neural activity increase oxygen consumption, one might expect the BOLD signal to decrease in response to neural activity, by assuming that the concentration of dhb increases. The reality is more complex, and in fact the signal increases. Following neural activity, the haemodynamic response results in a regional increase in cerebral blood flow (CBF), this in turn affects the cerebral blood volume (CBV) as shown in figure 2.7. Although oxygen is extracted from the blood, oxygen consumption is ultimately diffusion limited, so overall there is a net decrease in the dhb concentration as shown in figure

36 Figure 2.7: Figures showing the relative changes of physiological variables, that results in the BOLD response Temporal and Spatial resolution of BOLD The time course of the BOLD response is complex and multifaceted and different parts of it may encode distinct information. The change in BOLD signal that occurs following stimulated neural activity is a measure of the HRD and is shown in figure 2.8. The shape varies with the nature of the stimulus and the underlying neuronal activity. 35

37 Figure 2.8: Illustration of the BOLD measured HDR. Following a stimulus, cortical neuron responses occur within tens of milliseconds, but the first observable BOLD response lags behind by about 1-2 seconds. As a stereotypical response to a single short duration neuronal event, the HDR has a stereotypical waveform characterised by the following sequential features; 1. Some studies have reported an initial dip in the BOLD signal around 1 second after the stimulus onset. This is thought to be the result of the immediate increase in metabolic activity that causes a transient increase in dhb as oxygen is extracted from the blood, as shown in figure After this initial dip, the increased metabolic demands cause blood flow to increase and there is an influx of oxygenated blood. There is a net increase in oxyhaemaglobin (Hb-0 2 ) as more is supplied to the area than can be extracted. This causes the BOLD signal to rise. 36

38 3. The signal reaches a peak about 5 seconds after the onset of activity. If the stimulus is extended in time, then the peak is extended into a plateau at slightly lower value than the peak. 4. At the cessation of neural activity the BOLD signal falls to a level below baseline, forming a signal characteristic known as the post stimulus undershoot. This effect is hypothesised to be caused by the CBF decreasing quicker than the CBV, causing a temporary increase in deoxyhaemaglobin as compared to baseline. The temporal resolution of fmri is determined by the TR, i.e. the time for a whole brain volume to be imaged, although it is not the only factor. The limiting factor with which neural activity changes can be inferred is the haemodynamic response. Neural activity which happens over a very short time scale is inferred from vascular changes which happen over much larger time frames. The TR is the rate at which the HDR is sampled, and so decreasing it indefinitely, one will not see continuing increases in temporal resolution. The spatial resolution depends on several factors, the most obvious being voxel size. Smaller voxels increase the spatial resolution, but come at the cost of acquisition time, and signal to noise (SNR). Even the smallest voxel will contain multiple tissue types, meaning the signal is a summation of the effect of each tissue. As the voxels increase in size however, this effect becomes more pronounced and the signal of interest may become more diluted. As temporal resolution in fmri is limited by the vascular origin of the signal, so is spatial resolution. A study comparing electrode recordings and fmri for monkey somatosensory cortex found that areas of activation overlapped, but that crucially fmri had larger activations [29]. The fmri signal is not as precise due to the filtering effects of its vascular nature, meaning that in a worst case scenario, the signal may come from blood vessels that are significantly removed from the site of neural activation. 37

39 3 Brain connectivity The aim of this chapter is to give a brief overview of the study of brain connectivity. It is a concept that can be defined at different spatial scales, but the review presented here focusses on the macro scale that is studied in neuroimaging, as opposed to micro scale connections between individual neurons. Connectivity, when discussing the whole brain, can refer to three different concepts regarding brain organisation and function; they are anatomical, functional, and effective connectivity. These three concepts are products of the two fundamental features of brain organisation; functional specialisation and functional integration. These different principles are all complementary components of a full understanding of how the brain is organised and how it functions. The purpose of all brain imaging, functional or structural, is to make inferences about one of more of one of these fundamental principles, and thus a whole assortment of different methodologies have been developed with that end in mind. 3.1 Functional Specialisation and Integration There are two complementary principle theories of brain organisation and function, which evidence suggests the brain adheres to; functional specialisation and functional integration. The theory is that the brain is segregated into distinct units that are functionally specialised for some aspect of motor or perceptual processing. When a cognitive task is performed, according to the theory of functional integration, it is due to the activity of multiple functionally segregated brain areas that form a network. Functional specialisation is a deep rooted idea in neuroscience, and has its origins in the outdated concept of functional localisation, i.e. that a function can be localised to a specific region in the brain. That the brain can be 38

40 segregated into different areas and that these areas may be identified with specific cognitive functions has always been central to neuroscience. Early electrical stimulation studies, which were developed with the aim of localising specific functions to brain regions in animals [30], as well as observations of patients with focal brain lesions showing specific impairment, such as those documented by Broca and Wernecke, cemented functional specialisation as a fundamental tenet of brain organisation. However despite these early findings, it was still problematic trying to localise specific functions to specific regions of cortex [31], and it so became apparent that localisation was insufficient to fully explain brain function. Thus the idea of functional specialisation and integration has arisen. They form a complementary picture of how the brain is organised and functions, and both are necessary for a complete understanding. From these fundamental precepts of organisation and function, the concepts of anatomical, functional, and effective connectivity arise. 3.2 Structural connectivity Structural connectivity refers to networks in the brain formed by physical connections between neurons, neural populations or anatomically segregated brain regions. The physical connections can be formed by synapses between neurons, or white matter fibre pathways between neural populations. Physical pathways are relatively stable over short time periods, but due to neural plasticity significant morphological changes can occur over longer time periods [32, 33]. Structural connectivity can gleaned indirectly from standard anatomical MR images by segmenting whole brain images into their constituent tissue types; white matter, grey matter, and cerebrospinal fluid [34]. Volumetric measures of white matter can serve as estimates of structural connectivity when for example morphology of white matter is associated with cognitive decline [35], as reduced structural connectivity is implied by reduced white matter volume. 39

41 Direct axonal connections can only be inferred with complete certainty by using invasive tracing techniques. The introduction of non-invasive methods based on diffusion weighted (DW) imaging such diffusion tensor imaging (DTI), from which probabilistic measures of in vivo structural connectivity are obtained, are now widely used in clinical and research settings. DW-MR is based on the principle of anisotropic diffusion of water in the brain [36], i.e. the rate of diffusion is not equal in all directions, particularly in white matter. DTI involves the acquisition of diffusion measurements in multiple directions, and then using tensor decomposition, calculates the diffusivities parallel and perpendicular to the white matter tracts [37]. This information can then be used to make virtual 3D reconstructions of the trajectories of white matter fibre bundles. Fractional anisotropy (FA), the normalised standard deviation of the diffusivities, is the most widely used DTI based index [37], and is used to make 2D gray-scale maps showing relative anisotropy values across the brain. DTI has established itself as a useful tool for characterising connectivity abnormalities within different patient groups, for example studies measuring anisotropy in the brains of patients with schizophrenia, have shown widespread FA reductions in multiple brain regions [38, 39]. 3.3 Functional and effective connectivity The rise in functional and effective connectivity as concepts represents a move by the neuroimaging community to try and characterise functional integration. In the field of neuroimaging, functional connectivity is defined as temporal correlations between spatially removed neurological events [7]. Effective connectivity methods attempt to make inferences about causation and directed influences between regions. As well as characterising different functional connectivity methods as either functional or effective connectivity, one can also describe them as either data-driven or hypothesis-driven. Generally speaking, methods that can be classed as measuring functional connectivity are data-driven, whereas to make inferences about causality and directionality one usually requires a model. There are exceptions to this rule such as psychophysiological interactions (PPI), which is a data-driven 40

42 method that attempts to measure directional influences and is thus classified as effective connectivity [40]. Functional connectivity is an observable phenomenon i.e. correlations in BOLD signal between spatially removed brain regions, and so measuring it does not require a model. Effective connectivity attempts to explain these correlations by way of some model explaining how they arise, and the parameters of the model are said to be the effective connectivity. For this reason model-based methods always measure effective connectivity and the model attempts to describe the causal influence between regions. These differences between functional and effective connectivity can also be thought of as reflecting different scientific approaches. Specifying and comparing different candidate models to explain the data, as is the approach with effective connectivity, follows the traditional hypothesis testing based approached to science. Conversely, functional connectivity which essentially just describes data, represents an exploratory approach consistent with discovery-based science. This distinction means the two different approaches have different applications. One such application, specific to functional connectivity, is the analysis of resting-state data i.e. that which is acquired during undirected mentation. In the last decade there has been a rise in the number of studies looking at functional connectivity in the human brain at rest [41], and this approach has been useful as a way of classifying particular groups of subjects, and could potentially be used as an imaging based biomarker for disease [42] Seed-Voxel Correlation Maps The simplest of the data-driven approaches, introduced in 1992 by Horwitz et al [43] for PET, and in 1995 by Biswal et al [44] for fmri, are seed-voxel correlation maps. Biswal et al performed standard fmri with patients during rest, and observed that low frequency (< 0.1 Hz) spontaneous fluctuations in the motor cortex, showed a high degree of temporal correlation with other parts of the brain. Given that it is computationally unfeasible to measure the 41

43 correlation between all voxel pairs in the whole brain, the practice involves specifying a seed voxel or region, which is then used as a regressor in a linear correlation analysis. Functional connectivity maps depicting the correlation between the seed time series, and that of every other voxel in the brain are produced. The results of this technique are obviously largely dependent on the initial choice of seed, and so this is of critical importance, but it also appeals to researchers as it allows them to test specific hypotheses of how focal areas of the brain are connected to others Matrix decomposition based methods There is a particular class of methods used to infer functional connectivity based on the observed data being composed of multiple underlying components. These are known as matrix decomposition methods as they involve decomposing the image matrix into separate components. Principle components analysis (PCA) is one such technique that was first applied to PET data to identify functional connectivity [45], and has since been applied to a variety of fmri datasets. Imaging data are reformatted into a twodimensional matrix, and then singular value decomposition (SVD) is used to decompose the data into a set of orthogonal eigenimages or principle components. Thus, the first principle component is an image which embodies the largest source of variance in the data, and each subsequent component has the highest variance possible under the condition of being orthogonal to the component preceding it [46]. PCA is a very simple technique, but is very limited in that components must be normally distributed, which is a prerequisite for ensuring orthogonality between components. This results in potential features of interest being distributed amongst multiple components. Also, given that fmri has a relatively low signal-to-noise ratio (SNR), one cannot be certain that the components considered correspond to meaningful brain connectivity, as opposed to physiological or scanner related noise. In fact, whilst it was initially used to identify patterns of activation in the data[47], it has since been used more as a tool for the removal of noise [48]. 42

44 Due to its limitations, PCA has largely lost ground to Independent Components Analysis (ICA) [49], a similar technique, except that components are separated by statistical independence rather than orthogonality. It was first introduced by McKeown et al [50] and used to identify task based activity, under the assumption that it should be statistically independent from sources of noise. There are two versions of ICA that vary in the dimension in which independence is maximised. Spatial ICA (sica) decomposes the data into spatially independent components, whereas temporal ICA (tica) decomposes the data into temporally independent components. The approach that is used is a consideration for the researcher, but the different versions seem to be suited different paradigms, however given that the spatial domain is much larger than the temporal one in fmri, sica is the method that dominates in the literature [51]. ICA has proved to be a popular technique, particularly with resting state data [49], after first being applied by Kiviniemi et al [52]. There are drawbacks to ICA that generally the same as those for PCA given their similar nature. Firstly, ICA is based on the assumption that the different signal sources being extracted are statistically independent, and if this is not case it becomes very ineffective. Secondly, as with PCA the problems of deciding which components are relevant, what they represent, and the appropriate threshold for maps, are all open questions [9] Psychophysiological interactions Often one would like to know how connectivity is modulated by task, and the psychophysiological interactions (PPI) method proposed by Friston et al [40] provides an elegant way to do this. PPIs are identified using linear regression models, in which the activity in one region is regressed onto the activity in another. If one repeats this regression analysis under a different experimental condition, then the change in the regression slope between the two conditions represents a PPI. Thus put simply, a PPI is a significant interaction between activity in a seed region and an experiment-related 43

45 signal change within a regression analysis. If the interaction regressor is significant, that implies that a stimulus related change in activity within a region is mediated by the activity within the seed region [5]. Whole brain maps of the interaction term can then be created and voxels that exhibit a stimulus dependent response to the seed region can be identified. One can then simply perform t-tests on the regression coefficients to identify group differences in effective connectivity [31] Structural Equation Modelling Structural Equation Modelling (SEM), including path analysis which is a special case of SEM in which there are no latent variables, was the first method used to infer effective connectivity in neuroimaging data [8, 53]. SEM takes the form of a general linear model (GLM), i.e. a generalisation of a linear regression model with multiple dependent variables, represented as interacting brain regions. Thus it can be thought of has an extension of a PPI which contains only one dependent variable. The model consists of a set of regions and a set of connections between the regions, which unlike those obtained via functional connectivity methods are directed, representing causal influences between regions, and are assumed a priori. A connectivity matrix is specified that represents a set of correlations between regions, and then parameters are estimated by minimising difference between the predicted covariance between variables and the actual covariance in the data. Typically one may divide data according to two different experimental factors, and then any difference in connectivity between these groups can be attributed to the effect of that experimental factor [54]. Given a set of parameters, one can use an optimisation procedure to find the connectivity matrix that maximises the likelihood [55]. The main disadvantages to using SEM is, given that it relies on correlation matrices, complex matrices of connectivity that would render the system underdetermined, i.e. with more unknown paths than known correlations, 44

46 cannot be solved [8]. Such a limitation rules out the kind of complex networks that are the most biologically plausible, such as those with multiple feedback loops Multivariate autoregressive modelling Another issue with SEM is that it does not consider temporal information in the fmri signal, and so one could randomly permute the time series and the results would not change. Thus multivariate autoregressive (MAR) models for fmri have been proposed by Harrison et al [56]. The use of autocorrelation models stems from the important discovery by Bullmore et al [57] that fmri data analysed by GLM produced residuals that are autocorrelated. In an autoregressive approach, the present value of a time series can be modelled as weighed summation of past values, with the number of past values incorporated dictated by the order p. MAR models extend this principle to multiple time series, so that a vector of present values for every regional time series is modelled as a linear sum of past vectors. Goebel et al proposed a method of using MAR models in fmri to infer directed influences in the context of Granger Causality [58]. Granger Causality was originally developed for analysis of economics data, and it infers causality based on the principle of temporal precedence, i.e. cause precedes effect. Given two time series x and y, if the present value of y can be better predicted by knowing past values of x, then x is said to Granger cause y, and if the reverse holds true then y can be said to Granger cause x [59]. By calculating the Granger causality between all voxel time courses with that of a seed voxel, whole brain EC can be deduced in a technique known as Granger Causal Mapping (GCM) [59]. GCM requires no model of interacting brain regions as is typical of effective connectivity methods, and thus represents the first attempt at an exploratory effective connectivity method in fmri. 45

47 The temporal features of the fmri signal however, render the application of Granger causality to fmri highly problematic. Most applications of Granger causality do not account for the haemodynamic response, and as the use of linear autoregressive models in fmri is based on the assumption that it is uniform across the brain, which is known to be false [60], spurious Granger causations can occur. It has been demonstrated, via the use of simultaneous EEG and fmri, that Granger causality in fmri cannot adequately infer causality as the requirement of temporal precedence is violated [61]. An additional issue is that fmri data is sampled at a much slower rate than that at which the causal interactions occur, i.e. in the order of seconds as compared to milliseconds for neural activity. This is a further violation of the assumptions on which Granger causality is based [31] Dynamic Causal Modelling As Dynamic Causal Modelling (DCM) is the effective connectivity method used in this thesis it will be detailed in isolation in the next chapter. 46

48 4 Dynamic Causal Modelling 4.1 Introduction Dynamic Causal Modelling (DCM) was first introduced for fmri data by Friston et al in 2003 [6], and integrated into the open-source Statistical Parametric Mapping (SPM) software. DCM was introduced as a way of inferring effective connectivity, and is fundamentally different from previously employed methods in that it was invented specifically for analysing functional brain imaging data. DCM uses an input-state-output model, a concept defined well before the advent of neuroimaging, but in this case which has been adapted for this specialised purpose. DCM is used for EEG and MEG as well as fmri, but this thesis focuses on DCM for fmri. The purpose of DCM is to make inferences about the coupling between distinct brain regions, and to examine how this coupling is dependent upon the experimental context. This means it requires a biologically plausible model of measured brain responses, which is both dynamic and non-linear in nature. DCM is used to infer hidden neuronal states from measured brain activity, in this case from the BOLD signal, within a Bayesian framework. Numerous types of DCM have been developed, but all of them are based on the following characteristics [62, 63]: 1. The idea of DCM is to construct a realistic model of interacting cortical regions, with a system of differential equations. 2. This neural model is then supplemented with a forward model of how the synaptic activity within these cortical regions translates to the measured response (BOLD in the case of fmri). 3. Inversion of the model based on Bayesian statistics, allows the parameters of the neuronal model of interacting cortical regions to be estimated from the data to give a measure of the effective connectivity. 47

49 Traditional DCM treats the brain as a deterministic dynamical system of interacting brain regions which can have several inputs, and treats an experiment as a designed perturbation of the system s dynamics. The inputs to the system are the usual stimulus functions that reflect the experimental design which are used in basic general linear model (GLM) methods. In this original format, bilinear differential equations are used to model the system, with the bilinear term representing context dependent modulation of effective connectivity. Since then DCM has been extended to allow for neurophysiological phenomena that are considered important. Three major extensions to DCM are listed: 1. Non-linear DCM [64] attempts to model how connectivity between two regions may be dependent on connectivity in another region, a process that is caused by synaptic interactions and that has been established though invasive electrophysiological experiments. 2. Two-state model DCM [65] allows regions to have more than one state, i.e. modelling within region connectivity between excitatory and inhibitory neuronal populations. 3. Stochastic DCM (sdcm) [66-68], which allows stochastic inputs or error terms, and thus can be applied to data in the absence of experimental manipulation such as resting-state data. The different types of DCM will be discussed further in the theoretical section of this chapter, but the type of DCM used in this thesis is the original bilinear variety, and so all references to DCM should be assumed to refer to this unless otherwise stated. Bilinear DCM requires direct inputs as it treats the brain as a dynamical system of coupled neuronal regions, in which the experiment is a designed perturbation of this system. In this respect it is different from established methods of connectivity such as SEM and other multivariate autoregressive processes, in which there is no designed perturbation, where the inputs are treated as stochastic and unknown. The requirement of an input, and the need to specify the brain regions that the system is composed of, mean that DCM is traditionally used to test a specific hypothesis that motivated a 48

50 particular experimental design, and therefore is not used as an exploratory technique as are other analyses of effective connectivity. The experimental inputs to a DCM and how they enter the model are an important aspect of the technique and form the basis of its ability to infer direct causal interactions between regions.i.e. effective connectivity. Since its inception DCM has been widely adopted by the fmri neuroimaging community and has been used to probe a variety of cognitive and neurophysiological questions [63]. 4.2 Neuronal state equations Neuronal state equations are the basis of all variants of DCM, and are known as generative models, in that they provide a model of how interacting neuronal regions from which the observed data were generated [69] Bilinear model The original variant of DCM is based on a bilinear model of neural activity and it is the one used exclusively in this thesis. Given any number of brain regions with neuronal states z = [z 1,...,z N ], one can posit any arbitrary model of the effective connectivity between these regions: z f z, u, Equation 4.1 A simple two-dimensional Taylor expansion around the system s resting state (z 0 =0,u 0 =0) provides an approximation to the function that is the bilinear state equation: 49

51 2 f f f z f 0,0 z u zu z u z u i A uib z Cu f A z 2 f B z u f C u i Equation 4.2 This gives a system of differential equations that describes how activity in any region can be driven by activity in any other region (matrix A), directly by external inputs u (matrix C), and by activity in other regions that is context dependent (matrix B) on the ith input. A response is defined as the change in activity over time, and so the units of connections are per unit time, and thus a strong connection is one that exerts its effect over a small time. Figure 4.1 provides a useful visualisation of some arbitrary model of interacting brain regions z, with inputs u. Figure 4.1: A schematic to show a system of connected brain regions with inputs that influence the system directly and one that modulates connections between regions. 50

52 To understand the model in terms of the bilinear state equation it can be written as a system of ordinary differential equations: z a z c u z a z a z a z a z z a z ( a b u ) z z a z ( a b u ) z Equation 4.3 These state equations are usually presented in the more succinct matrix form: z1 a z1 c1 z a a a a z 0 u u z a a b z z4 0 a42 0 a44 0 b z Equation 4.4 This form has the advantage of allowing one to see how the parameters are arranged into matrices. Matrix A represents the intrinsic connectivity which contains forwards, backwards and self-connections between regions. Matrix B embodies changes in connectivity that are context dependent with regards to the experimental design. Matrix C represents the connectivity induced by the experimental stimulus. 51

4.2.2 Non-linear model The non-linear DCM was introduced by Stephan et al [64] to allow for more complex interactions between brain regions and the connections between them.

53 4.2.2 Non-linear model The non-linear DCM was introduced by Stephan et al [64] to allow for more complex interactions between brain regions and the connections between them. Examples of such interactions include any of the mechanisms that fall under the term short-term synaptic plasticity, and rely on history of prior activity [70, 71]. One such process that relies on this is neuronal gain, where the influence one region exerts on another is determined by the input to that region from a third region [64, 72]. The modulation of the connectivity between the two regions is then said to be gated by the third region, and this process has been shown to exist through a wide range of experiments [73]. Figure 4.2 A schematic to show connectivity within a system of connected brain regions that is modulated in a nonlinear way. To model these non-linear interactions it is simply a case of extending the Taylor expansion to include the second-order term in the states z. 52

54 2 2 2 f f f f z z f 0,0 z u zu 2 z u z u z 2 i n A uib znd z Cu f A z 2 f B z u C D f u 1 2 i 2 f z 2 n Equation 4.5 It is now the same as the bilinear model, but with an additional D matrix for which non-zero values indicate how the connectivity between two regions is dependent on activity in the nth region Two-state model Two-state DCM is an extension of the original bilinear model of DCM, in that regions can have two states representing excitatory and inhibitory neuronal populations within [65]. 53

connectivity within regions as well as between them.

55 Figure 4.3 A schematic to show node within a system may be represented by two-states that are excitatory and inhibitory respectively. This allows for richer system dynamics, conforms to a more plausible model or cortical organisation, and allows for inferences to be made on connectivity within regions as well as between them. The expanded form can be written in the same format as the bilinear model, given in equation 4.6. z Jz Cu J A u B () i i i EE EI EE J11 J11 J1N 0 IE II J11 J J EE EE J N1 0 JNN J IE 0 0 JNN J z z z z z E 1 I 1 E N I N EI NN II NN Equation

56 The Jacobian matrix J represents the effective connectivity, and within each region z n there are four entries J EE, II, EI, IE nn Jnn Jnn Jnn Jnn that correspond to all combinations of within-region connectivity between excitatory and inhibitory connections. The priors enforce positivity or negativity constraints on the system to ensure stability. Specifically, all within-region connections are negative, except the connection from the excitatory to the inhibitory population which is positive, and between-region connections can only occur between excitatory populations and are positive Stochastic model Stochastic DCM is a fundamental departure from the original deterministic DCM in that it models random endogenous fluctuations in the neuronal states [66, 67]. Stochastic DCMs are implemented by extending the bilinear model to include random fluctuation terms for both states and inputs i z A uib z Cv z v v u Equation 4.7 The state equations are use the same variables as bilinear DCM, but with the addition of a noise term representing state noise ω z, and a noise term representing random fluctuations in experimental input ω v. Stochastic DCMs require novel inversion techniques to account for their added complexity [63] which have been compared by Li et al [67]. Daunizeau et al [74] compare the relative merits of stochastic DCM at identifying network structure compared with determinist DCM. They showed that data obtained from an epileptic subject during an absence seizure, were best modelled as a transient 55

57 change in network connectivity that could only be achieved by including noise in the neuronal model. 4.3 Haemodynamic model The models of neuronal states form the basis of DCM, in which the aim is to estimate the parameters given in the matrices in order to make inferences about the effective connectivity. However in order to make inferences about hidden neuronal states one needs a phenomenologically accurate forward model that can translate synaptic activity into the BOLD signal that is measured in fmri. This is done in DCM using a haemodynamic model that is an extended version [75] of the balloon model [76]. Neural activity in each region is the cause of the BOLD response measured at that region in fmri. is the BOLD response is specific to fmri and so in other modalities the forward model is a different one reflecting how the data are generated, but for fmri it is a model of the haemodynamic response that is shown in figure

Figure 4.4: A schematic showing the organisation of the haemodynamic model used in DCM. Regional changes in synaptic activity are known to cause changes in local blood volume and dhb concentration.

58 Figure 4.4: A schematic showing the organisation of the haemodynamic model used in DCM. Regional changes in synaptic activity are known to cause changes in local blood volume and dhb concentration. This means that for each region as well as the primary state variable z, that corresponds to the regional neural activity, there are four secondary state variables that correspond to the biophysical state variable of the haemodynamic forward model, which was first presented by Friston et al in 2000 [75]. These four state variables of the haemodynamic model, s, f, v, and q, correspond to a vasodilatory signal that is a function of the neuronal activity, the change in local blood flow, the change in local blood volume, and the change in dhb concentration respectively. As is shown in figure 12 their relationship to one another can be considered as the activity-dependent vasodilatory signal causing an increase in blood flow, which causes the blood volume to decrease and the dhb concentration to decrease as it is diluted. Associated with this model are a set of parameters, of which there is a subset of biophysical related parameters, κ, γ, τ, α, and ρ, which correspond to the rate of signal decay, the rate of flow dependent 57

59 elimination, the haemodynamic transit time, Grubb s exponent, and the resting oxygen extraction fraction respectively. This equation has since been modified to account for fmri acquisition parameters and newly updated biophysical constants [77] and slice timing [78]. 4.4 Parameter estimation The neuronal state equations and the haemodynamic model combined, provide an explanation of how the data were generated, and are therefore referred to as a generative model. x { z, s, f, v, q} x f ( x, u, ) c h {, } Equation 4.8 For given inputs u, and neuronal state parameters θ c and haemodynamic parameters θ h, a predicted response h(u) can be obtained by integrating equation 4.8. The observed data y can then be modelled as the sum of the predicted response h(u), confounding effects X (with parameters β), and an error ε. y h() u X Equation

60 These high dimensional equations cannot be solved analytically and it would be computationally very costly to use a brute force numerical method [69]. Therefore when DCM was introduced [6], a variational Bayesian (VB) technique was also introduced [6]. Using this Bayesian inversion scheme, parameters for the complete model are estimated (inverted), given the data and the prior distributions on the parameters. Using Bayes theorem, the posterior probability of the parameters is expressed mathematically in equation p p y, m p, m y, m p y m log p y, m log p y, m log p, m log p y m Equation 4.10 The maximum posterior distribution of the parameters is then approximated using the iterative optimization EM algorithm, details of which are given in Friston et al [6]. 4.5 Model priors Due to the complexity of DCMs, model inversion needs to be more dependent on constraints, which is why DCMs are inverted within a Bayesian scheme. Each parameter is constrained by a prior distribution which is based on empirical knowledge and the estimation procedure produces a posterior distribution. Placing DCM within a Bayesian framework is a necessity due to its complexity but it also has many advantages compared to inference based on classical statistics. Using classical statistics such as p-values one is estimating the probability of observing the data given no effect which is a problem as one can never say for certain that an observation has not 59

61 occurred. Bayesian inference however, produces posterior distributions that are the probability of the effect given the data observed [79]. There are self-evident properties of neuronal dynamics that can be used as priors on the parameters of the neuronal state model. Neural activity cannot increase to infinitely high values and in the absence of an external input the dynamics are likely to return to a stable mode. These concepts are used to constrain DCMs through shrinkage priors on the coupling parameters that place a small probability on self-excitation and high values of regional activity. The priors used for the five biophysical parameters of the haemodynamic model are based on empirical values that have been obtained [80]. Priors for the remaining haemodynamic model parameters, which cannot be biophysically informed, are identified as those which minimise the sum of squared differences between the Volterra kernals they imply and the Volterra kernals derived directly from data [75]. One can also add additional constraints to optimise a particular DCM if one has information about the anatomical structure. Previous studies have used structural connectivity information obtained via invasive tract tracing in macaque monkeys to inform the structure of models for effective connectivity studies [62]. Although these data are of high resolution they are not necessarily relevant for human studies due to inter-species differences in connectivity. This problem can be overcome by using structural information obtained via DTI. Despite the fact the data are less detailed and do not contain directional information they have still been successfully integrated into DCM as priors by Stephan et al [81]. In this study, probabilistic tractography based on data collected via DWI was used to calculate the probability of anatomical connections existing between visual areas of the brain. Models with anatomically informed priors were then compared to those without using BMS, and proved to be superior in terms of model evidence. 60

62 4.6 Inference There can be two types of inference in DCM; inference about parameter space and inference about model space [62]. If, for example, one is interested in the specific effect of a connection, such as whether it exhibits an excitatory or inhibitory effect, it requires inference about the parameters of a model. Alternatively, one may wish to make inferences about model structure, for example to determine the presence of feedback connections. Early DCM studies tended to be more focussed on inference about parameter space [63] however following a proliferation of methodology papers devoted to model selection [82-86] inference about model structure has become more common. One area in which inference about parameter space is still dominant in group studies between patients and controls [87] where BMS is used as an intermediary step in defining model structure for comparison of parameters between groups using classical statistical methods Bayesian Model Selection (BMS) The problem of model selection is a generic one encountered in any modelling approach which is concerned with the question, which, of a set of competing model is most likely given the data. The problem is confounded by the fact the model fit alone is not enough to infer which model is best. Model complexity also needs to be considered to ensure that the model is not overfitting the data (i.e. it is generalizable) [88]. When it was first introduced it was proposed that DCMs should be compared using a combination of Akaike s Information Criterion (AIC), and Bayesian Information Criterion (BIC) [85]. Log differences in model evidence (log Bayes factor) were used to compare competing models and a value greater than 3 was suggested as the threshold for accepting one model over another. Since then BMS has become the preferred method of comparing models using an approximation 61

63 of the Free Energy as model evidence [89]. The model evidence is given in equaation 4.11., p y m p y m p m d Equation 4.11 This integral cannot be solved analytically but can be approximated, as detailed in Penny et al [85]. The approximation is given in equation 4.12 as the log model evidence and consists of an accuracy and complexity term. p y m accuracy m complexity m log Equation 4.12 Two models, m 1 and m 2, can therefore be compared using the Bayes factor [90] given in equation B 12 p y m 1 p y m 2 Equation 4.13 The Bayes factor is simply the difference between the log model evidences for model 1 and model 2. This means that the most likely model is the one with the greatest log evidence. The AIC and BIC provide simple approximations to the log evidence, and were used in early DCM studies, however a free energy approximation is now preferred. As equation 4.12 shows, the complexity term penalises a model based on its complexity. In AIC and BIC the complexity term is simply a function of the number of model 62

64 parameters. In the free energy approach the model evidence is approximated by equation 4.14, where F is known as the free energy, and the last term is the Kullback-Leibler (KL) divergence between the true posterior p(θ y,m) density and the approximate posterior density q(θ) [69]. p y m F m KL q p y m log,, Equation 4.14 Due to the Gibb s inequality, the (KL) divergence is always positive, meaning the free energy provides a lower bound on the log model evidence. When the KL divergence is equal to zero then the true and approximate posterior densities are the same and the free energy is equal to the log model evidence. Thus the EM optimization scheme serves to maximize the free energy implicitly decreasing the KL divergence and making the approximate posterior distribution as close to the true one as possible, simultaneously providing an approximation of model evidence. Unlike AIC and BIC, in which each parameter in the model is penalised equally by the complexity term [83], the free energy approach has a complexity term that is the KL divergence between prior and approximate posterior. Thus, parameters are not penalised equally, and so the more a parameter deviates from its prior, the greater the penalty. This extra sensitivity has been empirically shown to make the free energy a better approximation to model evidence [84]. For multiple subject analyses two options exist depending on how one considers parameters are distributed in the population [62]. In the fixedeffects (FFX) approach one assumes that model structure is the same for each subject in the population, and in the random-effects (RFX) approach one allows for the possibility that different subjects have different models. 63

65 Fixed Effects Analysis In the FFX approach, since every subject is assumed to have the same model, the model evidence given a dataset Y composed of independent data for individual subjects y n, is simply the sum of the log model evidences for each subject, given by equation N p Y m p y m n 1 p Y m p yn m log log N n n 1 Equation Random Effects Analysis The RFX approach assumes that for each subject different models generate the observed data. Assuming that the data are generated by models drawn from a probability distribution, this is achieved using a Bayesian hierarchical approach that can be inverted to obtain an estimate of the distribution [86]. A prior distribution of model probabilities is given by a Dirichlet distribution given by equation 4.16, where r m is the probability of model m from a set of M total models, α m are the number of times model m is selected in the population, and so can be viewed as the number of subjects for whom that model generated the data [86], and Z(α) is a normalisation term. 1 p r Dir r M m 1 m Z m 1 Equation

66 The inversion of the model produces an approximation to the posterior distribution P(r Y). This was previously achieved using a VB approach [86] but since then a Gibb s sampling method has been suggested [84] and is the preferred method when comparing large numbers of models, i.e. more models than subjects Model families Family level inference for DCM is an innovation introduced by Penny et al [84] as a way of removing uncertainty in model structure. They showed that comparing large numbers of models in the traditional manner can be problematic and that grouping models into families according to some characteristic, e.g. input location, is a more robust approach. A family partition is defined and models are classified as belonging to one of the subsets which must be non-overlapping. The partitioning of the model set into families reflects the question being asked by the researcher Model space Clearly defining a plausible model space should be a fundamental component of any DCM study [62]. The problem is a general one in that for any experimental data, there are an infinite number of models that could explain it, which vary in both structure and parameter values. For this reason one always has to place limitations on model space to constrict it to a set of plausible alternatives. This is already an inherent part of the DCM framework which is based on Bayesian statistics. As already noted, prior distributions on parameters aim to constrain the parameters, which describe neural activity and the haemodynamic response, to values which are biophysically realistic. However given any number of regions, even with constraints on parameters, there are still a vast number of model structures that could explain the data. 65

67 The problem of defining a plausible model space is not a trivial one and the main issue being highlighted [91, 92] is the problem of so called combinatorial explosions. Given a number of brain regions n, the number of possible models in bilinear DCM is determined by the equation 4.17, where j is the number of experimental manipulations and k is the number of connections between nodes n which is equal to n(n-1). k nj ij k m i 0 i k k! i k! i k! Equation 4.17 As can be seen from figure 4.5 as the number of regions and experimental conditions increases the number of possible models rises very rapidly. The problem is exacerbated in non-linear DCM by the addition of another term that adds even more degrees of freedom. 66

68 Figure 4.5: Graph showing the number of total possible models given the number of nodes. One approach is to impose some limitations, usually based on intrinsic connectivity, and then estimate all possible models within a greatly reduced model space [93, 94]. Others have chosen to adopt a hierarchical approach [95, 96] by first defining a model space of varying intrinsic connectivity and then using the winning model to define a new model space of varying modulatory effects. Pyka et al developed a genetic algorithm to search model space, and found that it was computationally more efficient than a brute force search [97]. Many studies comparing healthy controls to patient groups have omitted a model space search altogether and instead chosen to use classical statistics to compare parameters on a hypothesised model between groups [98-100]. This is a particularly popular approach for group studies [87] but 67

69 has recently been discouraged except for cases when one has very strong a priori knowledge concerning model structure [62]. The original purpose of DCM was as a hypothesis driven approach in which a limited number of carefully selected models were compared in order to test a specific hypothesis about how the data were generated. It is still primarily used in this fashion though there has been a recent trend for comparing ever increasing numbers of models [84, 101] and thus using DCM as a more exploratory method. The problem with comparing large numbers of models is that it is computationally intensive due to the need to fully invert each model. To counter this problem, Friston and Penny have recently proposed a solution [82] in which only a single model is fitted to the data. Known as posthoc BMS only the largest of a set of models need be inverted and then model evidence for all the reduced models within this set is approximated. In addition to model evidence the connectivity parameters can also be estimated from the posterior distribution of parameters in the full model [102]. Based on the ability of post-hoc BMS to score large numbers of models, Friston et al [101] outlined a method for network discovery using DCM along with the additional constraint that connections are bidirectional. This work has been expanded by Seghier et al [103] for a large DCM network containing twenty nodes using the principle components of the functional connectivity network as constraints on the intrinsic connectivity Inference on parameter space When making inferences on model parameters one is faced with the same decision to make as with group-level BMS, i.e. FFX or RFX. A number of FFX methods exist, such as Bayesian Parameter Averaging (BPA), in which the posterior parameter distributions for each subject are combined according to Bayes theorem [104, 105]. A comparison of different FFX methods for Bayesian parameter inference in group studies can be found in Kasess et al [104]. However, a RFX approach in which subject specific 68

70 parameter estimates are compared using a second level analysis using classical frequentist tests such as t-test or ANOVA are more common [62]. Another approach is Bayesian Model Averaging (BMA) [84] in which parameter estimates are not dependent on a single model but are averaged across multiple models within a set and are weighted according to the probability of each model. BMA is useful for scenarios in which there is no clear winning model or for comparison between groups in which model structure may not be equal such as patients and controls [62]. 69

71 5 Neuroimaging in Psychiatry This section will comprise a very limited literature review of neuroimaging in the field of psychiatry. Given the enormity of the literature, including multiple imaging modalities and psychiatric disorders, a comprehensive assessment is beyond the scope of this thesis. Instead, this chapter will focus on the most relevant areas to this thesis, i.e. fmri in depression, primarily with regards to emotional face processing, connectivity analyses, and antidepressant treatment studies. 5.1 Introduction Modern imaging techniques are an important part of a new era of psychiatry that attempts to provide biological foundations for disorders, leading to the possibility of new diagnostic criteria that are based on neurobiological substrates, or even a complete reclassification [106]. In 1976, Johnstone et al first identified structural abnormalities in the brains of patients with schizophrenia by using CT imaging to demonstrate they had enlarged ventricles [107]. This is still a consistently replicated finding [108] proving the efficacy of neuroimaging for understanding the aetiology of psychiatric illness. Since then it has become generally agreed that psychiatric illness can result from environmental and genetic causes that reveal themselves in measurable changes within the brain [109]. Currently diagnoses in psychiatry are based on descriptive criteria and largely rely on the subjective assessment of clinicians. Neuroimaging is likely to play an important role in providing more biologically grounded constructs by referring to specific abnormalities in brain structure and function [106]. Thus far, robust aetiological models and reliable biomarkers are currently non-existent for mood and psychotic psychiatric disorders [109] and so diagnostics continue to be based on a group classification of symptoms 70

72 [110]. A potential role for neuroimaging in psychiatry, besides the increased understanding of the mechanisms that cause disorders, is the prospect that it can be used to identify biomarkers of illness. The definition of a biomarker as proposed by the Biomarkers Definitions Working Group is; A characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. [111] Neuroimaging based biomarkers for mental disorders have achieved moderate success for neurodegenerative disorders [109], for example using PET to distinguish between Alzheimer s disease and other forms of dementia [112]. Imaging based biomarkers have also played a significant role in proof of concept for new drugs [113] but there has been little success with diagnostic biomarkers [3]. Despite this current limitation neuroimaging has revealed structural and functional abnormalities in an array of psychiatric disorders, including depression [114], schizophrenia [115], bipolar disorder [116] and autism [117], and continues to be a promising technique for the development of diagnostic biomarkers. Biomarkers may be important not only for identifying current psychiatric illness, but also predicting vulnerability to illness, or for guiding initial treatment options. For example, earlier onset of bipolar disorder is associated with a significantly worse long term prognosis [118, 119], and so early intervention is an important public health goal. Many of the symptoms of bipolar disorder are associated with more common disorders making misdiagnosis a problem, which would result lead to inappropriate treatment [120]. The onset of psychosis in Schizophrenia is preceded by a period of subclinical symptoms known as the prodromal phase, which may last weeks or years [121]. Prodromal like symptoms are associated with a high risk of developing a psychotic disorder [122], and so early clinical intervention in high risk individuals is thought to improve long term clinical outcomes [123]. As such, neuroimaging biomarkers may be able to identify 71

73 individuals at risk of developing psychosis, or predict if and when at risk individuals will transition to full clinical symptoms. It should be noted that it is not yet clear how imaging based diagnostic biomarkers will be clinically useful. Ideally biomarkers should be fundamentally linked to the underlying pathophysiology of the illness. Without biologically grounded knowledge of the aetiology of psychiatric disorders, there is a danger of circularity whereby biomarker development is based on traditional symptom based classification [124]. Additionally, the complex nature of psychiatric diseases and the heterogeneous nature of their expression mean that at present, the development of prognostic biomarkers is a potentially more useful goal [125] Connectivity Within the field of neuroimaging, measures of connectivity in the functioning brain are increasingly common, with fmri particularly popular due to its relatively high spatial resolution and non-invasiveness. This general trend is reflected in psychiatric neuroimaging following a growing appreciation that characterising psychiatric disorders by functional abnormalities in specific areas of the brain in isolation is not sufficient [87, 126]. Neural networks have been shown to be of critical importance in understanding a range of psychiatric disorders including schizophrenia [99, 127, 128] and depression [95, 96, 98, 129, 130]. New methods for quantifying the properties of large scale networks based on graph theory are becoming increasingly popular [131] and the potential for connectivity based measures to predict disease states is an exciting one [42, 132]. Schizophrenia provides a good example of a psychiatric illness for which advanced connectivity methods show great promise. It is a disorder that has long been associated with dysfunctional brain networks and recently the disconnection hypothesis posited that the deficits present in schizophrenia are the result of abnormal modulation of synaptic plasticity that leads to 72

74 impaired functional integration [ ]. Modern connectivity techniques, such as DCM, are showing great potential for identifying abnormal connectivity in patients with schizophrenia and in groups with a high risk of developing it. Studies using DCM have shown abnormal effective connectivity in both first episode schizophrenic patients (FE) and at risk mental state (ARMS) compared with healthy controls [99, 127]. The added complexity offered by DCM in the form of the bilinear or nonlinear terms in the neuronal state equations has also proved to be advantageous in probing abnormal connectivity in Schizophrenia. This finer level of discrimination has allowed researchers to directly link severity of symptoms to altered effective connectivity [100, 136]. These studies in schizophrenia highlight the importance of sophisticated connectivity methods for inferring subtle variations between different group and disease states. The use of DCM has demonstrated that abnormalities in effective connectivity in schizophrenia are present in the prodromal state and may become more pronounced following the onset of illness. Furthermore, DCM may potentially be able to demonstrate the mechanisms via which antipsychotics work and even the neural substrates of schizophrenia symptoms. In summary the growing literature surrounding the use of DCM in schizophrenia is a good reflection of its potential for psychiatric functional neuroimaging [137], and in particular, the higher order terms in the DCM neuronal model have proved to be of critical importance in many studies. 5.2 Depression Major depressive disorder (MDD) is a debilitating mood disorder that places a major strain on society. By 2020 it is predicted to be the leading cause of disease burden in the developed world [138]. Depression is classified broadly by episodes of low mood but there is also substantial evidence of impaired cognitive function [139]. These cognitive impairments are not confined to one domain, and commonly include memory, attention, and executive function, as well as emotional and motivational impairments [140]. 73

75 One specific form of impairment that has been widely studied, and spans multiple cognitive domains, is an emotional processing bias [ ]. There is an extensive literature demonstrating affective processing biases, particularly in memory and attention, using an array of neuropsychological tests [142]. A variety of memory tasks, all with an emotional valence component, have demonstrated significant biases in depression [143]. One of the most robust findings in depressed patients is a bias towards recollection of negative emotions and/or away from recollection of positive emotions [144]. With the advent of neuroimaging techniques it has become possible to probe the neurophysiological bases of these observed processing biases. Depression is associated with some subtle structural changes in the brain [114, ], in particular, localised reduced frontal cortex, orbitofrontal cortex and hippocampal volumes. However, structural changes are not as clearly evident as for other disorders such as schizophrenia which is associated with global brain volume reductions [148]. There is also confusion regarding how age, illness severity and treatment influence structural changes. Additionally, a general overlap of structural changes between psychiatric disorders means, thus far, the development of a structural neuroimaging based biomarker for MDD has proved to be elusive [145]. Functional neuroimaging has possibly been more productive in identifying abnormalities in depression. The neural substrates of psychological processes involved in emotional processing in healthy individuals are relatively well understood [141]. For example the amygdala is known to play an important role in emotion recognition, and amygdala responses to emotional stimuli have been demonstrated in numerous studies [149, 150]. Abnormal amygdala activation in patients with depression is a consistent finding of functional neuroimaging studies [ ]. Functional neuroimaging has implicated abnormal activation in wide and diverse range of cortical and sub-cortical brain areas [154]. The most consistently identified areas in which functional abnormalities have been found are frontal and cingulate cortex, with limbic and sub-cortical areas such as amygdala, insula, and thalamus also being found [130, 155]. 74

76 5.2.1 Emotional processing bias To investigate the effect of emotional biases in attention, researchers have used tasks in which emotion is incorporated as a distractor from the main objective. In the Affective Go/No-Go task subjects are required to give motor responses as quickly as possible, or inhibit responses to a set of words with either positive or negative emotional valence (figure 5.1a). Patients with MDD have been found to respond quicker to words with negative emotional valence [156, 157]. Functional neuroimaging has been used to clarify the neural substrates of this attention bias. Elliott et al [158] found depressed patients had increased activation in response to sad words in the ACC, and decreased activation in response to happy words with the reverse being true for controls. Similar emotional biases with regards to attention have been probed with emotional Stroop paradigms [159], in which subjects are required to name the font colour of a word presented that has either positive or negative emotional valence (figure 5.1b). Depressed patients show slower reaction times to words with negative emotional valence [160]. The negative bias associated with depression, has been linked to increased ACC activity [161]. Functional neuroimaging has led to mounting evidence of the importance of ACC in depression [162], specifically hyperactivity in subgenual ACC [163], and hypoactivity in dorsal ACC (dacc) [164]. 75

Figure 5.1: Illustration of A) The affective go/no go task and b) the emotional stroop task. Two paradigms that are measure emotional biases in attention.

77 Figure 5.1: Illustration of A) The affective go/no go task and b) the emotional stroop task. Two paradigms that are measure emotional biases in attention. To summarise, functional neuroimaging has made a significant contribution to explaining the impaired affective cognition present in depression. Despite the variability in the neuroimaging literature, a general consensus has emerged that depression is characterised by relative decreases in activity of frontal areas such as dorsolateral prefrontal cortex and dacc, and concurrent relative increases in activity of limbic areas such as amygdala and thalamus [165]. The ACC is hypothesised to act as a neural bridge between cognitive and emotional processing systems [166] and abnormal activity in this area has consistently been found and is also a robust indicator of treatment response [162]. Models of dysfunctional networks are increasingly being directly probed using connectivity methods. There is evidence of abnormalities in functional connectivity between the limbic and prefrontal regions, both using resting state [167] and during emotional task based fmri [ ]. Evidence of functional connectivity that is dependent on illness severity [170] and normalised following antidepressant treatment [169] all support a theory of impaired cortical regulation of limbic activity in MDD. 76

78 5.2.2 Emotional face processing in depression According to Ekman and Friesen there are six basic emotions that can be recognised by all humans regardless of cultural boundaries [171]; happy, sad, fearful, angry, disgust, and surprise. A standardised set of stimuli based on these emotions has become one of the most widely used for investigating affective cognition [159]. Emotional faces, when compared with neutral faces, are known to elicit activation in frontal, limbic and visual areas of the brain [ ]. Haxby et al [173] have proposed a model of face perception comprising a core system of occipitotemporal regions that encodes invariant aspects of face recognition and an extended system that includes regions from other cognitive systems that are recruited to attribute meaning to the faces (figure 5.2). In the core system, lateral fusiform gyrus has consistently been found to exhibit bilateral activation in response to faces [175], so much so that it has been dubbed the fusiform face area (FFA), and has been proposed as a specialised module for face perception [176, 177]. 77

79 Figure 5.2: Schematic of proposed brain regions involved in face processing [178]. The core visual areas process invariant aspects of faces, whereas extended prefrontal and limbic areas process changeable aspects. Identification of emotional faces is consistently found to be impaired in patients with depression, with general deficits in identifying emotion reported [ ] and specific negative biases, either as a tendency to perceive happy faces as neutral or to perceive neutral faces as sad [ ]. Group studies of depressed patients processing emotional faces have found abnormalities in a multitude of brain regions including the hippocampus, amygdala, cingulate gyrus, fusiform gyrus, insula, caudate, thalamus, ventral striatum as well as frontal, parietal and temporal regions [159]. The literature contains many inconsistencies regarding specific regions, their sensitivities to different emotions and the directionality of differences between patients and controls. The amygdala is a region in the extended system thought to be important in the perception of emotional faces. Amygdala activation seems to be particularly sensitive to the processing of fearful faces [ ] although there is evidence that the amygdala responds to emotional stimuli irrespective of valence [172, 190]. As the amygdala has an established role in mediating the perception of emotional content in face recognition, it is not 78

80 surprising that it has been implicated in numerous studies that probe neural substrates of abnormal face processing in depression. Some studies have found that depressed patients show increased levels of amygdala activation in response to emotional faces [ ] Connectivity Connectivity analyses allow one to directly test models of network dysfunction during the processing of emotional faces. Whilst there is an established literature on abnormal activity within different brain regions during emotional face processing in depressed individuals, relatively few studies have explored how these abnormalities are mediated by connectivity between the regions. It is a promising area of research as more subtle interactions between emotions, connectivity and disease states can be investigated. Frodl et al [197] used a seed based regression analysis to investigate the functional connectivity between orbitofrontal cortex (OFC) and the rest of the brain during a face matching task in which sad and angry emotion was either an explicit or implicit condition, i.e. matching faces based on emotion or based on gender. They found increased functional connectivity between OFC and dorsolateral prefrontal cortex in depressed patients and reduced functional connectivity between OFC and dacc. Using the same task as Frodl, but without the implicit condition, Carballedo et al [198] used SEM, which enabled them to infer directionality, and also found increased connectivity between OFC and prefrontal cortex (PFC). In the left hemisphere they found reduced connectivity from amygdala to OFC and increased connectivity from OFC to PFC. In the right hemisphere they found reduced connectivity from amygdala to OFC, ACC, and PFC. Versace [199] et al used a mutual information technique to measure functional connectivity in patients with bipolar disorder during an emotional face labelling task. They found reduced bilateral functional connectivity between the amygdala and OFC in response to happy faces and an increase in response to sad faces. 79

81 This is the opposite of the effect found in MDD patients by Carballedo, who found reduced bilateral reduced connectivity between the amygdala and OFC in response to sad and angry faces, although the task was different and so were the patient group. This hints at differences in connectivity within an emotional processing network in patients with MDD as compared to those with bipolar disorder. Almeida et al [98] used DCM to directly test for differences in connectivity between OFC and amygdala during a face emotion intensity labelling task in groups of patients with MDD and BD. For each subject they specified a single DCM with four regions, bilateral OFC and bilateral amygdala, with reciprocal connections within each hemisphere. They chose to omit the bilinear term from their model as the paradigm they used involved two separate sessions for the happy and sad emotions respectively. Thus modulatory effects of emotion on connectivity were implied by differences in parameter values between sessions. They found that in the left hemisphere, top-down connectivity from OFC to amygdala was greater in both MDD and BD groups compared to controls for both emotions. However in the right hemisphere, they found bottom-up connectivity from amygdala to OFC in the happy condition was lower for BD compared with MDD and controls. This is another example of abnormal frontal-limbic connectivity but one in which different patient groups can be differentiated by their unique abnormalities. Goulden et al [95] used DCM to probe effective connectivity during an implicit emotional face processing task in a group of remitted depressed (rmdd) subjects. Subjects were presented with faces that were happy, sad, neutral, or fearful, and they had to identify the gender by button press. They considered four regions, primary visual cortex (V1), fusiform gyrus (FG), amygdala and OFC, and used a three tiered approach using BMS and freeenergy approximations to model evidence to determine the most likely model structure for each group,. In the first step they compared 7 models to determine the intrinsic connectivity, which they found to be reciprocal connections between FG, amygdala and OFC, and a single forward connection from V1 to FG with faces with all emotions as the input into /V1. In the second step, modulatory effects of emotion were considered by 80

82 partitioning 21 models into 7 families for each emotion. In the final step they compared models within the winning family to find the most likely model for each group. By considering the most likely model family, and winning model within that family, for each emotion they found that rmdd subjects had a different pattern of emotion dependent connectivity to controls. Intriguingly their results suggest a reversal in the emotion modulated connectivity between groups as shown in figure 5.3. Figure 5.3: The most likely models of happy and sad modulated connectivity in HC and rmdd groups during emotional face processing found by Goulden et al [95]. Within a frontotemporal network, rmdd subjects exhibited more modulations when processing happy than sad information, whereas for HC the reverse was true. For rmmd subjects they found modulation by happy faces of bidirectional connections from OFC to FG and amygdala, and modulation by sad faces of the backwards connection from FG to OFC. Controls showed modulation by happy faces of the backwards connection from OFC to FG, and modulation by sad faces of bidirectional connections from OFC to FG and amygdala. Both groups showed the same intrinsic connectivity but differed significantly in the modulation of those connections in response to happy and sad 81

83 emotions. This demonstrates how DCM can be used to infer a finer level of discrimination of the neural substrates of emotional processing, and is an example of the power of effective connectivity analyses to reveal subtle differences between groups Effects of antidepressants and fmri Modern psychopharmacology has its roots in the 1950s with the fortuitous observation that certain compounds, originally developed to fight tuberculosis, exhibited antidepressant properties [200]. Discoveries of the mechanisms from which their success derived, led to the influential monoamine hypothesis that has since guided the development of antidepressants based on the principle that depression is caused by low levels of neurotransmitters in the brain [201]. The most widely used antidepressants fall into a class of drugs known as selective serotonin reuptake inhibitors (SSRI) which function by increase cerebral concentrations of the neurotransmitter serotonin (5-HT). That increasing synaptic 5-HT concentration is responsible for the treatment effect of SSRIs has been strongly supported by a technique known as acute tryptophan depletion (ATD). Tryptophan is the chemical precursor to 5-HT. The ATD procedure requires that subjects ingest a tryptophan free amino acid mixture which subsequently lowers cerebral tryptophan concentration, and therefore 5-HT synthesis [202]. ATD has been shown to induce relapse in depressed patients in remission [203] and lower mood in non-depressed individuals with a family history of depression [204], yet it has little effect on those with no family history [205]. Functional neuroimaging presents an exciting opportunity to study the action of SSRIs in a way that has previously not been possible. A technique growing in popularity in human studies is pharmacological fmri (phmri) [206]. This technique allows researchers to measure the direct effects of SSRIs on the BOLD response in the brain as well as how they influence brain activation induced during cognition. Standard fmri can be employed in 82

84 repeated measures studies with patient groups to look at before and after treatment effects. This allows researchers to not only probe the neural basis of cognitive deficits in psychiatric disorders, but also to show how antidepressants normalise them. With more sophisticated connectivity methods frequently being used, this allows even more specific actions of drugs to be visualised and discovered. Several studies have shown that increased amygdala activation in response to emotional faces in MDD is normalised following treatment with SSRIs [191, 194, 196, 207]. There are however some inconsistencies regarding whether the hyperactive amygdala activation observed is state- or traitdependent. Victor et al [207] reported enhanced left amygdala activation in response to masked sad faces in both currently depressed and unmedicated remitted depressed participants whereas healthy controls showed enhanced activation in response to masked happy faces. Following treatment with the SSRI sertraline hydrochloride in the currently depressed group, amygdala activations reverted to the normative pattern. In contradiction Arnone et al [191] found that only currently depressed, and not those in remission, exhibited a hyperactive left amygdala in response to sad faces. They too found the pattern of activation reverted to normal following treatment with an SSRI, in this case citalopram. However the fact the abnormal activation was only present in currently depressed patients suggest that it is a characteristic of the depressed state whereas the findings of Victor et al suggest it is a more long term effect that represents trait abnormalities or more enduring effects of past illness i.e. a scarring effect. In both cases effect treatment with SSRIs reduced hyperactivity of the amygdala is response to sad faces,offering an insight into how drug action affects emotion processing systems that are impaired in depression Connectivity Other studies have examined the effect of antidepressant administration on connectivity within the brain of depressed subjects. Chen et al [169] used a 83

85 seed based regression analysis to measure functional connectivity between bilateral amygdala and the rest of the brain. They compared MDD patients with controls performing an implicit emotional face processing task over two sessions separated by 8 weeks with MDD patients receiving the SSRI fluoxetine between scanning sessions. MDD patients show increased functional connectivity between amygdala and ACC, and PFC, following fluoxetine treatment to the level associated with healthy controls. This result is compatible with the theory that antidepressants work by increasing cortical regulation of abnormal limbic activation [208]. Anand et al [209] have reported similar results in MDD patients following treatment with sertraline. They observed an increase in corticolimbic functional connectivity during rest and exposure to images of neutral and positive, but not negative, emotional valence. This is another example of SSRIs showing select improvements in neuronal abnormalities. It could possibly reflect the difference between state dependent changes accompanying successful treatment, and more permanent effects that represent trait abnormalities or scarring effects. A limitation with functional connectivity analyses is that they do not offer information on the direction of connections. Effective connectivity measures represent directed influences and so are more informative. However studies using effective connectivity techniques to look at the effect of antidepressants are very limited. SEM of PET data has shown how difference in effective connectivity can distinguish between MDD patients who respond well to antidepressants and those who don t [155]. Passamonti et al [210] used DCM to study effective connectivity in healthy controls when viewing neutral, sad, and angry faces following ATD. They found that 5-HT had a significant influence on the connectivity between PFC and amygdala, a circuit implicated in a range of affective processes. This has strong implications for for a range of psychiatric disorder including depression, and suggests indirectly that SSRIs may function by altering connectivity within affective processing systems. They considered a system of three regions, amygdala, ventrolateral prefrontal cortex (VLPFC) and ventral ACC, with intrinsic connectivity set to be fully connected, i.e. reciprocal connections between all three regions. They then estimated 49 84

86 plausible models in which the location of driving input,and the modulation of connectivity by emotion varied systematically. They compared models using RFX BMS. In the placebo condition, the most likely model had angry faces modulating connectivity in every connection within the network, and input entering the system via the amygdala. In the ATD condition they found evidence for this model was lower, and there was increased evidence for models with fewer modulatory effects. Their results suggest that 5-HT increases the effect that angry faces have on connectivity between PFC and amygdala, and therefore hypothesised that 5-HT increases the capacity for the PFC to regulate responses to negative emotions in the amygdala. A limitation with studies looking at antidepressant treatment is that experimental controls are often lacking. In the above studies, for ethical reasons, there is no placebo group in the patients and no drug group in the healthy controls. This makes it very difficult to discern whether effects relate to mood-state changes or are pharmacological in nature. The use of remitted MDD groups can help to explain some of the differences between statedependent and state-independent abnormalities in depressed groups, and the growing use of connectivity analyses which allow a finer level of discrimination have also revealed these differences Summary This chapter has given a very brief overview of the current state of psychiatric neuroimaging with fmri and the increasingly dominant research focus on brain connectivity. DCM stands out as a particularly promising method for inferring connectivity based abnormalities in psychiatric patient groups. However there still remain certain conceptual and methodological issues that are preventing it from being more widely adopted [87]. Many of these issues reduced to a common underlying limitation of the hypothesis led nature of DCM, which is a direct consequence of the complexity of the generative model employed. This chapter has reviewed the used of DCM in 85

87 psychiatric imaging studies, in particular in depression research, and in all cases its use necessitates a priori assumptions about network structure. 86

88 6 Paper 1 Limiting model space for an exploratory approach to network inference in Dynamic Causal Modelling for fmri. Authors Joseph Whittaker (1) Rebecca Elliott (1) Shane McKie (1) Affiliations Neuroscience and Psychiatry Unit, University of Manchester (1) 6.1 Abstract Since its inception, Dynamic Causal Modelling (DCM) has been successfully used to infer how spatially remote areas of the brain integrate to form functional networks during cognitive fmri tasks. Here we present a method in which one can limit model space for DCM in a more data-driven way than is traditionally used. We demonstrate that the connectivity within a system of brain regions can be ascertained from inferring the connectivity within smaller systems consisting of regions taken from the entire system. By analysing the data in this fashion, we can effectively explore the entire network structure space, while estimating a much smaller number of models than would be typical. We have demonstrated this approach with two different fmri tasks and, in each case, verified the method by comparing the results with those obtained via estimating the entire network structure space in the traditional way. 87

89 6.2 Introduction An increasingly important concept in neuroimaging is that of brain connectivity. It has become apparent that cognition is the result of specialised areas of the brain interacting as part of a network, rather than working in isolation [31]. This fact has led to a demand in methods designed to infer connectivity from functional magnetic resonance imaging (fmri). Dynamic Causal Modelling (DCM) is one such method that allows researchers to infer effective connectivity, i.e. the influence one brain region has on another [6]. Effective connectivity allows investigators to establish how remote areas of the brain are functionally integrated to form networks that are observed during task activations in fmri [211]. Given that interactions between regions are not directly observed, a method such as DCM, which models these hidden neural states, is necessary to infer effective connectivity. DCM, in the form described in this study, treats the brain as a fully deterministic system, in which the experimental stimulus is a designed perturbation (input). Within a Bayesian framework, DCM parameters are estimated, and it is these parameters which determine the effective connectivity. The different classes of parameters estimated give the intrinsic connectivity, i.e. connectivity elicited by the task, as well as modulatory parameters, i.e. how different factors of the task modulate the intrinsic connectivity. DCM for fmri was conceived as a hypothesis driven approach to inferring distributed neuronal networks that underlie the observed BOLD response measured. A typical DCM analysis involves carefully defining a limited model space, a subset of all possible models (network structure space), in order to test a specific hypothesis [212]. Although DCM has proven to be successful when used in this way, it is not always applicable to other types of research. This concern is reflected by the fact that recently there has been a trend in increasing the number of models (> 50) compared in a typical analysis [93, 213, 214], which has effectively moved DCM into more exploratory based territory [84, 212]. The logical conclusion of this trend would be to explore the 88

90 entire network structure space for a particular system by not constraining the model space, thus making the analysis more data driven. The problem with this approach is that as the number of nodes in the system increases, the total number of possible models in the entire network structure space increases in what has been dubbed a combinatorial explosion. Given a number of nodes (n), the total number of possible models (m) in the bilinear form of DCM is given by Equation 6.1 where k equals n(n-1) and j is the number of experimental factors (direct and modulatory inputs in the DCM design matrix); k nj ij k m i 0 i k k! i k! i k! Equation 6.1 Friston et al [215] has recently suggested a more exploratory method for DCM which uses a computationally efficient way of scoring large sets of models. A similar method has also been outlined for using DCM to infer networks with a large number of regions [103] by using functional connectivity data to constrain intrinsic connectivity. These analyses were made possible by recent innovations in BMS [102] that allow model evidences for models within a predefined set to be approximated, with only the most complex model being estimated (inverted). As model inversion is computationally intensive, this approach known as post-hoc BMS allows larger model spaces to be explored. Here we present a method for applying DCM which allows larger network structure spaces to be explored indirectly, by exploring the much smaller model spaces of sub-networks that constitute the network as whole. Figure 6.1 shows the total number of models possible for a given number of nodes, with the often adopted constraint that there is only one type of input that acts 89

91 directly on the system, not on the modulations therefore j=½, and only one type of input that modulates the connections, but not the inputs (j=½), i.e. inputs=2 but j=1 in equation 1. Modelling the entire model space is therefore computationally intractable at only 4 nodes, with the total number of models being in the order of millions, and with 5103 models needed practically undesirable at 3 nodes given the length of time taken to estimate all the models with standard commercially available computers. Figure 6.1: A figure to illustrate how the reduced two-node model space is formed and the number of models needed to explore entire network structure space for n=2, 3 and 4 nodes. The graph shows the total number of possible models for a system depending on the number of brain regions it consists of. It considers modulation by one factor only and on connections only, not the regions themselves. Thus, a three node system has a total of 5103 possible models, but if the system is considered as two node sub-systems, the same model space can be represented by 81 models. In this paper we propose the alternative modelling scheme depicted in figure 1, in which 3 two-node sub-systems are used to infer a three-node system, 90

92 thus vastly reducing the model space needed for an exhaustive search. Unlike Friston et al [215] network discovery approach, all models are fully inverted meaning the negative free energy is computed for each model and can then be compared using BMS. We hypothesise that many of the models in the three-node model space will contain redundant information, and that the entire network structure space can be explored using a much smaller number of model inversions thereby saving computation time. We performed a DCM analysis for subjects completing two different fmri tasks that are of the type widely used in psychiatric imaging research, i.e. the n-back task and the implicit emotional processing faces task [95, 191, ]. These data have previously been analysed in terms of sample size estimates needed for the DCM parameters [220]. In that study, a bootstrapping procedure was applied to estimate variance for parameters in order to ascertain the sample size necessary for group inferences, and recommended approximately 20 subjects per group. Although the exact number of subjects required to detect group differences varied depending on the specific connectivity parameter considered in each task. In this study, for each task, data were extracted from three relevant brain regions selected from the group results. Three systems consisting of only two brain regions were then defined i.e. each three node system is represented by three systems consisting of two nodes (Figure 6.1). For the purposes of this paper we shall call this the two-node approach. In order to validate the two-node approach results (m=81), every possible model for the entire system consisting of all three regions were estimated (m=5103). This is called the three-node approach. Given that for the three-node approach all possible models are estimated the model inferred can be regarded as being the true network structure within the context of a DCM analysis. The analysis will involve creating families of inputs and intrinsic connectivity separately and using Bayesian Model Selection (BMS) [84] to reduce the model space on which inferences on the modulations can be made. The purpose is to establish how accurate the two-node approach is in reducing the input and intrinsic connectivity model space compared to the three-node approach. 91

93 The overall aim is to reduce the model space to determine the input and the intrinsic connectivity, and then one only need estimate models for all possible modulations. The rationale behind this is the common practice in psychiatric DCM studies to use classical statistical tests to infer differences between patient and control groups [87, 212]. Bayesian model averaging (BMA) can be used to can be used to estimate parameter as model evidence weighted averages obtained over a model space [84]. Therefore the proposed way of applying the two-node approach is to deduce the most likely direct input location (matrix C) and intrinsic connectivity (matrix A) across all subjects, and then use the inferred model structure to do BMA for the whole network. Once the input and intrinsic connectivity have been determined, there are only a small number of modulations to test. For example, if the intrinsic connectivity is fully connected (i.e. 6 connections for 3 nodes) then a maximum of 64 modulatory models are needed for inference, and then for every scenario in which there are fewer intrinsic connections, this number reduces by a factor of a half for each connection that is taken away. Inference on either model or parameter space can then proceed in the recommended fashion [212]. 6.3 Methods An analysis of the data has previously been published in Goulden et al [220]. In that publication the details of fmri acquisition are fully outlined, although a reiteration of the details of the image acquisition and a brief reprise of the tasks will be given here. The data has been reanalysed for this study to demonstrate the two-node method Subjects The subjects consisted of 24 (12 male) right handed volunteers aged

94 6.3.2 N-back task Participants were presented with a series of letters split into blocks, with each block consisting of 13 letters. Each letter was presented for 1.5 seconds, and there was a 0.5 second interval between each letter giving a total of 26 seconds per block. The beginning of each block contained a 9 second instruction screen that stated when the subject was expected to respond via a button press, on a MR compatible response box, for that particular block. There were three types of task blocks in which letters were presented and a rest block (R) in which a fixation cross was presented for the entire duration of the block (same length as task blocks i.e. 35 seconds). The task blocks consisted of a 0-back block, in which the subject had to press on the appearance of the letter X; a 1-back block, in which the subject had to press when the letter presented was the same as the one previous; and a 2-back block, in which the subject had to press when the letter presented was the same as the one before last. The task lasted a total of 10 minutes 30 seconds and the blocks were presented in the following order: 01R02R02R01R02R01R. Figure 6.2: Schematic to illustrate the n-back task design, where press indicates that the subject needs to respond by button press. This example shows the 1-back condition followed by the rest condition, followed by the 0-back condition. 93

6.3.3 Implicit emotional face processing task Participants were presented with a series of faces, with an equal number of male and female faces, and were required to identify the gender of each face

95 6.3.3 Implicit emotional face processing task Participants were presented with a series of faces, with an equal number of male and female faces, and were required to identify the gender of each face via a button press. The faces were presented in different blocks, in which each face displayed one of three emotions; neutral (N), angry (A), or fearful (F). Each block lasted 20 seconds, and there was an additional rest (R) block, which lasted 18 seconds plus a 2 second instruction screen. There were a total of 24 blocks, bringing the total duration of the task to 8 minutes that presented in the order NANRNFNRNANRNFNRNANRNFNR. The subjects were not informed of any change in the emotion of the faces, and were not asked to attend to the emotion of the faces, thus making emotion an implicit component of the task. Figure 6.3: Schematic to illustrate to implicit emotional face processing task design, where in the stimulus N=neutral, R=rest, F=fear. The subject needs to respond with gender of the face presented, m=male and f=female. 94

96 6.3.4 Image analysis A 1.5T Phillips Intera scanner was used to acquire images at the Welcome Trust Clinical Research Facility at the University of Manchester. Volumes were collected using a single-shot echo-planar (EPI) pulse sequence and composed of 29 contiguous axial slices (3.5mm by 3.5mm in-plane resolution),with a slice thickness of 4.5mm. The TR and TE were 2100ms and 40ms respectively. For the implicit face recognition task, 227 volumes were acquired, and for the n-back task 285 volumes were acquired. Additionally, for each subject, a high-resolution T1-weighted structural image was acquired in order to allow functional images to be registered to a standard stereotactic space. All pre-processing was done in SPM8. Slice timing correction was performed on all images and, they were realigned to the first volume to correct for subject movement. Anatomical images were co-registered to the mean of the functional images, and then segmented into grey and white matter and CSF. Functional images were normalised and smoothed with a Gaussian kernel (FWHM 7x7x10mm), and a high pass filter, 320 seconds ( Hz) for the implicit emotional processing faces task and 210 ( Hz) seconds for the n-back task, was applied Dynamic Causal Modelling A DCM analysis was performed using DCM8 within SPM8. The most significant group activations in the brain regions considered were used as the basis for ROI extraction in individual subjects. For each subject, the local maxima were found within 14mm of these group activations, using a threshold of p<0.05 uncorrected. Data were then then extracted from a 6mm sphere around the local maxima for each subject. For simplicity DCMs were estimated using data extracted from the right hemisphere only. 95

97 As previously reported in Goulden et al [220] for the n-back task data were extracted from the dorsolateral prefrontal cortex (DPC; MNI coordinates = 56, 18, 40), supplementary motor area (SMA; MNI coordinates = 4, 11, 55), and posterior parietal cortex (MNI coordinates = 53, -39, 50). The DCM system used all n-back conditions as the direct input (i.e. working memory), and the 2-back condition was considered as modulating the connectivity. For the implicit emotion face processing task, data were extracted from the fusiform gyrus (FG; MNI coordinates = 42, -67, -20), inferior occipital gyrus (IOG; MNI coordinates = 39, -81, -10), and amygdala (AMY; MNI coordinates = 21, -7, -20). The DCM system used the presentation of any face as the input and anger was the emotion considered as modulating the connectivity. Two different approaches to model selection for the input and intrinsic connectivity for a three node system were compared. The three-node approach involved estimating all permutations for a system of three ROIs, with a single modulatory effect. The total number of possible models for this scenario is given by equation 1 as 729 for the only one input and 5103 for all possible inputs. The two-node approach involved estimating models for three separate two-node systems formed using two of the three nodes from the three-node system. The total number of models for a two-node model is 9 for only one input and 27 for all possible inputs. This is multiplied by three to give a total number of models equalling 81 for the two-node approach. In both approaches, models were grouped into families of inputs and intrinsic connectivity and compared using Bayesian Model Selection (BMS), in order to make inferences on model structure, using both fixed effects (FFX) and random effects (RFX) methods [84, 85]. All models were estimated on a reasonably high-spec workstation with a 64- bit quad-core processor (2.4 GHz clock speed) and 12 GB DDR3 RAM. Model estimation took on average approximately 30 seconds on this machine, meaning that estimating all 5103 models for the three-node approach took approximately 40 hours for each subject, whereas estimating all 81 models of the two-node approach took approximately 40 minutes. This meant that for all subjects in each task, model estimation took approximately 14 hours and 5 weeks for the two- and three-node approaches respectively. 96

98 For the two-node approach, the most likely direct input to the three-node system was investigated by comparing the winning input of each individual two-node system. Each node appears in two out of the three two-node systems. This means that most likely input to the three-node system can be inferred from the results of all three two-node systems on aggregate, i.e. the node that is the most likely input in both systems in which it appears is the most likely input overall. For the three-node approach, models were grouped into families according to where the direct input entered the system. To allow comparison for the intrinsic connectivity between the two- and three-node approaches the coupling between each specific two nodes was isolated and models that contained either of those connections were grouped into families according to the directional pattern of connectivity. This is illustrated in figure 6.4. Figure 6.4: Illustration showing how intrinsic connectivity families are formed for both the twoand three-node methods. In the three-node method, 4 families are defined for each set of two nodes (total of 3), making the results directly comparable to the two-node method results. Additionally BMA was performed in both approaches. Models were averaged over both the winning input and intrinsic connectivity families in order to test for significance of the input and intrinsic connectivity parameters respectively. Using both BMS and BMA represent the two different ways DCM can be applied, i.e. making inferences on model structure or inference on model parameters. A one-sample t-test was then performed on each BMA 97

99 parameter to test the null hypothesis that it has mean zero. The resulting p- values were FDR-adjusted to correct for multiple comparisons. The following steps represent how we suggest the two-node approach may be used to infer model structure. 1. Given n regions, the number of 2-node systems (S) is given by ( ) 2. For each two-node system estimate the 27 possible models (assuming modulation by 1 factor only) i.e. S x 27 models in total 3. Define model families based on input and intrinsic connectivity 4. Infer the input to the system from the on aggregate BMS input family results for the two-node systems 5. Infer the intrinsic connectivity of the system from the BMS intrinsic connectivity family results for the two-node systems 6. Use BMA to obtain average parameters that can be tested for significance 7. Identify areas of uncertainty in the model structure based on both the BMS and BMA results and consider testing increasing model space along those dimensions 8. Once the input and intrinsic connectivity have been identified. Specify and estimate full system models to test for possible modulations e.g. given 3 regions, and a fully connected intrinsic network, there are 64 possible modulation effects, of which either a subset or the whole set could be compared using BMA. For group studies that are typical of psychiatric imaging research, this method is a novel and hypothesis free way of determining the model structure over which difference in parameter between groups can be identified and can therefore be applied to novel tasks. 98

100 6.4 Results fmri group activations As mentioned above, for both tasks, statistically significant activations were found in three regions of interest. However for both tasks, there were subjects in which one of the regions was not activated, and so those subjects were not included in the analysis. For N-back task there were 4 subjects for whom supplementary motor area activation was sub-threshold, and for the implicit emotional processing faces task there were 9 subjects for whom amygdala activation was sub-threshold, giving a total of 20 and 15 subjects in the final analysis for each task respectively. The decision to remove subjects with sub-threshold activations in any of the ROIs was motivated by the fact that the purpose of deterministic DCM is to explain evoked responses [6]. It therefore makes sense to remove subjects who do not show strong activations [93], thus improving network inference by not including noisy nodes N-back task BMS Input The input family results for the N-back task in the two-node and three-node approaches are given in figure 6.5. In the two-node approach the SMA only was the most likely input in both systems in which it occurs (PPC & SMA system: p=1.00 RFX, p=1.00 FFX; DCP & SMA system: p=0.97 RFX, p=1.00 FFX). This means that on aggregate, the SMA is the most likely input location and so it can be inferred that it is the location of the input to the 99

101 entire system. For the three-node approach the most likely input family was the SMA only also (p=0.95 RFX, p=1.00 FFX). Intrinsic connectivity The intrinsic family results in the two-node and three-node approaches are shown in figure 6.5. In the two-node approach, in all 3 systems it is the bidirectional family that is the most likely (PPC & DPC system: p=0.85 RFX, p=1.00 FFX. PPC & SMA system: p=0.96 RFX, p=1.00). From this it can be inferred that the most likely intrinsic connectivity model is one in which each node has reciprocal connections with each other node i.e. fully connected. For the three-node approach, in all 3 couplings the most likely family is the bidirectional one (PPC & DPC coupling: p=0.88 RFX, p=0.98 FFX. PPC & SMA coupling: p=0.88 RFX, p=0.86 FFX. DPC & SMA coupling: p=0.97 RFX, p=1.00 FFX). From these results it can be inferred that the most likely intrinsic connectivity model for the three-node approach is the same as inferred from the two-node approach i.e. fully connected intrinsic connections BMA The BMA parameter values are given in table 6.1. In the two-node approach, the SMA is significant in both systems in which it occurs. In the three-node approach the only significant input parameter is SMA. In both approaches, all intrinsic connectivity parameters are significant (pfdr<0.05). 100

102 6.4.3 Implicit emotional face processing task The results for the implicit emotional face processing task are more difficult to interpret than those of the n-back task. For this reason the BMS posterior probabilities have been listed in table 6.2 (supplementary material) for clarity BMS Input The input family results for the two-node and three-node approaches are also given in figure 6.5. The input families inferred from the two-node approach do not match those from the three-node approach. In the RFX analysis they are split between input to FG alone or IOG alone and in the FX analysis they show input to IOG alone. However, as RFX assumes that individual subjects can have different models, and both approaches show a split between FG and IOG for the input, it can be hypothesised that the whole subject group can be split into distinct sub-groups according to where the direct input feeds into the system. Using the k-means clustering algorithm, on the two-node BMS input family probabilities, the subjects were split into two distinct sub-groups; one in which the direct input was FG alone (n=6), and one in which the direct input was IOG alone (n=9). The results from the two- and three-node approaches for the FG and the IOG sub-groups are shown in figure 6.6. These results are now substantiated by the results from the three-node approach, which show FG as input for FG group, and IOG as input for IOG group. 101

103 Intrinsic families The intrinsic family results for the two-node and three-node approaches are given in figure 6.5. For the intrinsic connectivity families for the three-node approach there is no real clear consensus in all 3 couplings as to which is the most likely intrinsic connectivity. The intrinsic family results inferred from the two-node approach are not verified by the three-node approach. After using the same input subject groupings as reported above for the intrinsic connectivity model selection the results became slightly clearer. The intrinsic family results for the two-node and three-node approaches for the FG and IOG sub-groups are shown in figure 6.6 With the subjects split into the two sub-groups, results inferred from the twonode approach begin to resemble those determined by the three-node approach. However in each sub-group, the two-node system that contains the input region and amygdala fails to match the results inferred from the three-node approach BMA The results for the BMA are given in table 1. For the two-node approach, in all three systems inputs to the fusiform and inferior occipital gyrus are significant whenever they are present. For the three-node approach, inputs to the fusiform and inferior occipital gyrus also have significant parameters. For both the two- and three-node approaches the intrinsic connectivity parameters are all significant except those for connectivity from the amygdala to the fusiform gyrus and inferior occipital gyrus. 102

104 Figure 6.5: Input family and intrinsic connectivity family BMS results for both the two-node and three-node approaches for both the n-back task and the implicit emotional processing faces task. N-back task; d=dorsolateral prefrontal cortex, s=supplementary motor area, and p=posterior parietal cortex. Implicit emotional processing faces task; A=amygdala, f=fusiform gyrus, and I=inferior occipital gyrus. 103

105 Figure 6.6: Input family and intrinsic connectivity family BMS results for both the two-node and three-node approachs for the implicit emotional processing faces task after being split into sub-groups based on the input location; A=amygdala, f=fusiform gyrus, and I=inferior occipital gyrus. 104

106 Table 6.1: The BMA parameters for the two-node and three-node approach given to 3 decimal places. The corresponding one-sample t-test p-values are given to 3 decimal places and are FDR adjusted. 105

107 6.5 Discussion In this study we have proposed a novel approach to network inference for DCM in which the most likely model within a network of 3 brain regions is inferred from individually analysing 2 node systems that consist of nodes taken from the 3 regions of the entire system. The advantage of this approach is that it is possible to get the same results one gets from modelling the complete model space of the entire system, but with a smaller set of models needing to be estimated. The entire network structure space is explored in a computationally efficient manner thus paving the way for DCM to be used in a more data-driven fashion. In this paper we have demonstrated the technique using a 3 node system, due to its tractability, but potentially it could be expanded to systems with more nodes. The results for the N-back task show that the direct input to the system can be inferred with a high probability using the two-node approach. In addition the intrinsic connectivity can also be inferred from the two-node approach. Therefore we can infer the input and intrinsic connectivity of a three-node network by estimating only a fraction of the number of models needed for an expansive exploration of the three-node model space, representing a significant computational advantage. The inputs to the implicit emotional processing faces task can also be accurately inferred from the two-node approach, but only once the subjects were split into two sub-groups determined by their individual results. The RFX approach, which accounts for multiple possible models within the data, did identify the fusiform and inferior occipital gyrus as inputs in the two-node approach. However, there was no evidence for both fusiform and inferior occipital gyrus as dual inputs however there was in the three-node approach. Using FFX, the most likely input family in the two-node approach was the inferior occipital gyrus, whereas in the three-node approach it was both the fusiform and inferior occipital gyrus. 106

108 Figure 6.7: Models inferred from the two-node and three-node approaches in the implicit emotional processing faces, before and after being split into sub-groups according to input location. Where there is a coupling that consists of a solid arrow and a dotted arrow, the most likely family is split between either both connections or the single connection represented by the solid arrow. A coupling that consists of a single dotted arrow represents the most likely family being split between that single connection or no connections. The connections that the two-node approach fails to correctly infer in the sub-groups are highlighted with a red ring. Figure 6.7 shows the models inferred for the implicit emotional processing task from both the two-node and three-node approaches in both the whole groups and sub groups split according to input. It is noteworthy that in both sub-groups the two-node approach fails to correctly infer the intrinsic connectivity between the input region and the amygdala (red circle). However, when we investigated the parameters estimates using BMA, we observed that in both approaches the intrinsic connection from amygdala to both the fusiform and inferior frontal gyrus were non-significant. Thus the two-node approach has ranked the family posterior probabilities in the same order as three-node approach in all cases, except those in which the 107

109 underlying parameters were non-significant. Interestingly, it is the three-node approach that gives higher evidence to the non-significant parameters. For the implicit emotional processing faces task, the two-node approach was partially successful, and in fact gave results with the same level of uncertainty as the three-node approach, except in connections between the input region and the amygdala. Given that the connections from amygdala to both the fusiform and inferior frontal gyrus were non-significant, this could explain why the differences between the two approaches centred on these connections. Additionally, given that these parameters are non-significant, the inclusion or exclusion in the model structure as determined by BMS is ultimately inconsequential. In conclusion, we have presented a novel method that aims to infer network structure, without a priori assumptions, that is computationally advantageous as compared to an exhaustive model search. We have shown in one dataset that our method is as reliable as an exhaustive model search at inferring the network structure. We have shown in another dataset that it cannot be reliably used to completely infer the network structure but that it is capable of identifying the most important features. The main differences between the datasets that may explain this disparity are as follows. Firstly the n-back is robust in its activations, whereas face recognition tasks are not so. For the n- back task, data is extracted from cortical regions that are sufficiently removed from the portion of the brain that is susceptible to dropout problems with the signal, thus those extracted regions are robustly activated [218]. The same is not true of the implicit emotional processing faces task, particularly with regards to the amygdala [221]. Secondly, the n-back task has a larger number of subjects than the implicit emotional processing faces task, and thus the issue could also be one of statistical power. With regards to the analysis pipeline included in the methods, for the n-back task the additional step 7 can be ignored, as BMS in the two-node approach correctly infers the input and intrinsic connectivity with high posterior family probabilities and the BMA parameter values on in the inferred model structure are significant. In the implicit emotional processing faces task, 108

110 assuming a lack of the knowledge presented by the results of the complete 3-node analysis, we would suggest that given the statistically insignificant parameters for the connections from the amygdala to fusiform gyrus, and inferior occipital gyrus respectively, a more thorough model search be completed over those connections. However given that the input has been reduced to a choice of 3 possibilities and the intrinsic connectivity between two regions (fusiform gyrus and inferior occipital gyrus) has reliably been inferred. The two-node approach has still been successful in reducing the computation time in an analysis. The main limitation to this study is that the two-node method was only fully equivalent to the three-node method in the n-back task. We have discussed possible explanations as to why the two- and three-node method results for the implicit emotional face processing task were not completely equivalent, and believe that a lack of statistical power is the most likely culprit. However, without demonstrating this empirically, we must be careful about generalising the results for the n-back dataset as providing complete validation for the two-node method as being applicable for future datasets. 109

111 6.6 Supplementary material Table 6.2: BMS family probabilities for both the two- and three-node approach. Both RFX and FFX analysis were performed; FEP=family exceedance probability (RFX), FPP=family posterior probability (FPP). 110

112 7 Paper 2 An exploratory approach to Dynamic Causal Modelling for fmri shows reproducible network inference Authors Joseph Whittaker (1) Rebecca Elliott (1) Anna Barnes (2) John Suckling (3) Bill Deakin (1) Shane McKie (1) Affiliations Neuroscience and Psychiatry Unit, University of Manchester (1) Brain Mapping Unit, Department of Psychiatry, University of Cambridge (3) 7.1 Abstract Multicentre studies are an effective way of increasing subject recruitment and exploring a larger demographic, thus allowing for more generalizable results. In the field of psychiatric neuroimaging, network based methods are becoming more popular as focus shifts on understanding disorders in terms of brain connectivity. Dynamic Causal Modelling (DCM) is one such method that has proven effective at identifying patient group specific deficits in brain connectivity. Here we apply a novel approach to DCM, which is exploratory in nature, to 10 subjects performing a working memory task at two sessions at three different scanning centres in the UK. We show that Bayesian Model Selection (BMS) results are reproducible at different centres and across different sessions. There is significant variability of DCM parameter 111

113 estimates between subjects, possibly linked to temporal effects of cognitive strategies resulting from repeated exposure to the same task, but not across centre or session. The findings show that DCM is robust enough to be used in multicentre studies and that our exploratory approach is just as effective as traditional approaches to DCM, but at a significant computational advantage. 112

114 7.2 Introduction Functional magnetic resonance imaging (fmri) has contributed significantly to research into psychiatric disorders [222], and has been instrumental in a conceptual shift in understanding them in terms of impairments in distributed brain networks [126]. The diagnosis of psychiatric disorders is not always straightforward because of shared symptoms which often exist between distinct disorders [106]. Psychiatric diagnoses are still predominantly symptom based, in contrast with the rest of medicine in which objective clinical tests are central [223]. For this reason the development of imaging based biomarkers that could provide quantitative measures for more accurate diagnoses, is highly desirable [109]. Biomarkers could also be beneficial in guiding treatment options, for example depression has a relatively poor treatment success rate, and likelihood of relapse is very high [224]. Imaging based biomarkers could also play a pivotal role predicting the onset, and long term prognosis of psychiatric disorders such as bipolar disorder and schizophrenia [120]. Given the similar nature between different disorders within a particular class, it is probable that connectivity based methods will be necessary when trying to tease out the differences in functional brain images. Whilst fmri is now a mainstay of psychiatric imaging research, this trend has yet to be realised in a clinical setting, mostly due to statistical power needed to detect small effect sizes [225]. There has been a rising interest in multicentre imaging studies as they increase statistical power by providing larger sample sizes and allow for a more generalizable interpretation of results due to a wider range of demographics [ ]. If psychiatric functional imaging is to have any impact on clinical practice, it seems likely to be through the use of large scale multicentre studies. It is therefore important to establish reliable fmri and connectivity methods that can be used across multiple neuroimaging sites [225]. A variety of different connectivity analyses have been developed for analysing functional brain connectivity, and these studies have often shown 113

115 that connectivity methods are more sensitive to differences between patient and control groups than standard regional activation analyses [229]. Dynamic Causal Modelling (DCM) is one particular method that has been applied to a variety of psychiatric disorders such as depression and bipolar disorder [98, 230, 231], and schizophrenia [136, 232]. The majority of these DCM studies in the psychiatric imaging literature only compare very small numbers of models, typically less than 10. Whilst this is perfectly acceptable, there is a growing trend in the general DCM literature for exploring increasingly larger model spaces [84, 101, 229]. We have previously outlined a method in which we demonstrated a novel approach to DCM (Whittaker et al., in prep.)(paper 1 in this thesis). This approach allows small numbers of models to be estimated, yet still effectively explores the entire network structure space, allowing for analyses that are more exploratory in nature to be computationally feasible. In this paper we use this two-node approach to analyse data obtained as part of a multicentre study. This allows us to see how DCM in general, as well as our specific methodology, compares across different subjects, sessions and scanning centres. Despite the growing use of DCM as a method there are very few studies that have attempted to test its reliability. Both Shuyler et al [233] and Rowe et al [229] have tested the reliability of DCM between sessions, with between session time periods of minutes and weeks respectively. Specifically, Schuyler et al demonstrated that DCM parameter estimates showed high reliability and Rowe et al found that model selection between sessions was robust. To date, to our knowledge, only one study applies DCM to data acquired from multiple centres [234] and that considers the special case of stochastic DCM. Therefore, this is the first study that examines the repeatability of bilinear DCM across both sessions and scanners. Here we answer three main questions: firstly does Bayesian model selection (BMS) identify the same family of models for the same group of subjects performing a working memory task at different scanning sessions and centres? Secondly, are Bayesian model averaged (BMA) parameters different between centres, sessions and subjects? And thirdly, do the BMS results for the two-node approach vary in the same way as the BMS results for the 114

116 standard three-node system analysis? Therefore this paper addresses two key issues, one regarding the repeatability of DCM across different scanning centres in general, and the other regarding the repeatability of our two-node approach specifically. 7.3 Methods Subjects Full details of the data collection are outlined in Suckling et al [225], although a brief reprise will be given here. Five centres were involved in the study as part of the PsyGRID consortium and the NeuroPsyGrid collaboration: The Wolfson Brain Imaging Centre at the University of Cambridge, the Magnetic Resonance Imaging Facility at the University of Manchester, the Institute of Psychiatry at Kings College London, the Department of Clinical Neurosciences at the Universities of Edinburgh and Glasgow, and the Centre for Clinical Magnetic Resonance Research at the University of Oxford. For this study, data from Oxford University were not included as that particular centre did not carry out any functional imaging. Data from the University of Cambridge were also not included in the analysis because of technical problems during the second session of scanning. Twelve male right handed individuals (age range: 19-34, mean age=25) participated after giving informed consent and being screened for medication, drug use, and history of head injury. Each participant was scanned twice at each of the three centres, with a mean time of 7.8 ( ) months between scans. 115

117 7.3.2 Task A detailed report of the data acquisition at each of the five centres has already been published [225], as well as a description of the working memory task [235], however a brief review of both will be given here. The task is a standard visual working memory task, widely used in cognitive and psychiatric imaging studies. The N-back task consists of a series of numbers from 1 to 4, being individually presented every 1.80 seconds and lasting for 0.50 seconds, at locations within a diamond shaped box. Participants must then recall the number seen N presentations previously, and identify by a button press that indicates the location of the number within the diamond. So during 1-back conditions subjects must identify the previously displayed number, and during 2-back conditions subjects must identify the number shown one before last. The task was a block design, in which sixteen 30 second blocks were presented to the subjects, alternating between the 0-back condition and either the 1 or 2-back condition, giving a total of eight 0-back conditions and four each of 1 and 2-back conditions. At each of the participating centres, 230 T2*-weighted whole brain volumes were acquired, with the first 6 images discarded, and a 2 second gap between corresponding slices in each volume (i.e. repetition time, TR). The acquisition parameters varied between centres due to different scanner models and acquisition software. The exact parameters for each centre are already published [225] Dynamic Causal Modelling A DCM analysis was performed using DCM8 within SPM8. The form of DCM used was the traditional bilinear form, in which interactions in a network of regions are modelled using as a system of differential equations, as given by equation 7.1, where z are time series vectors of the neural state of each 116

118 region, and u are time series vectors of the different experimental factors of a stimulus presentation. i z A uib z Cu Equation 7.1 In this form, effective connectivity is governed by the intrinsic connectivity parameters in matrix A, which represent the task independent connectivity strengths, and the modulatory parameters in matrix B, which represent changes to the intrinsic connectivity that are context dependent. Matrix C describes where the driving input enters the system, and so in this form DCM treats the brain as a fully deterministic system in which connectivity between regions is governed by the experimental stimulus entering via an input regions, and interactions between regions that may be dependent on factors within the experimental paradigm. For this study the direct input was the working memory condition, i.e. 1-back and 2-back conditions combined. Significant group activations were found in the following regions using a threshold of p<0.05 uncorrected: Dorsolateral prefrontal cortex (DPC; MNI coordinates = 36, 38, 22), posterior parietal cortex (PPC; MNI coordinates = 39, -40, 36), and premotor cortex (PMC; MNI coordinates = 30, 8, 40). These regions have all been demonstrated to be critically involved with this task [218, 236, 237]. Regions of interest were then extracted as 6mm spheres in individual subjects, using the local maxima found within 14mm of these group activations as the centre. DCMs were estimated using data extracted from the right hemisphere only. Model estimation then followed the procedure previously outlined (Whittaker et al, in prep)(paper 1 in this thesis). Models were estimated for three separate two-node systems, with the nodes being selected from the total set of three-nodes (Figure 7.1). 117

119 Figure 7.1: Illustration showing how the two-node model space is formed from the whole network three node model space. The total number of models possible for each two-node system, allowing for modulation by one factor only on the connections between nodes, is 9 per input. With three possible input scenarios, this brings the total number of models estimated for each two-node system to 27. With a set of three separate two-node systems needed to replicate the three-node system, this brings the total number of models estimated to 81 per subject per session per scanner. For the complete three-node system, the entire model space, given a single modulatory effect on connections, was also estimated; a total of 5103 models per subject per session per scanner. The input to the threenode system can be inferred from the aggregate input results from the 3 twonode systems, and the intrinsic connectivity for the three-node system can be inferred from the intrinsic connections determined by the 3 two-node systems. All models were estimated on a reasonably high-spec workstation with a 64- bit quad-core processor (2.4 GHz clock speed) and 12 GB DDR3 RAM. Model estimation took approximately 30 seconds on average on this machine, meaning that estimating all 5103 models for the three-node system took approximately 1.5 weeks for each subject across the 2 sessions and 3 scanners (6 functional scans in total), whereas estimating all 81 models of the two node approach took approximately 4 hours. This meant that for all subjects model estimation took approximately 40 hours and 15 weeks for the two- and three-node approaches respectively. 118

120 Three approaches were employed for evaluating the reliability of DCM, for the full three-node system, and also specifically for our two-node method. Firstly we looked at whether the most likely model family identified for both input and intrinsic connectivity was the same in both the first and second session. We also looked at whether the same families of models were identified at each of the different centres. Models were compared using Bayesian Model Selection (BMS). BMS compares models using free energy, which is an approximation to the model evidence that is obtained for every model that is estimated. In both approaches models were grouped into families according to input region and intrinsic connectivity. Random effects (RFX) analysis, a hierarchical BMS approach introduced by Stephan et al [86] which is less affected by outliers, was used to select the most likely model family. Secondly free energy estimates for each model, in both approaches, were averaged over subjects to give a vector of free energies for each scanning session (length equal to the number of models). For both approaches vectors were averaged over centre and session to create free energy vectors each session (2 sessions) and centre (3 centres) respectively. For both approaches, the linear correlation (Pearson s r) between sessions, and between each centre and every other centre, was calculated. Thirdly, Bayesian Model Averaging (BMA) [84] was performed. Parameter estimates obtained via BMA are not dependent on one specific model, and are instead model evidence weighted averages obtained from a set of models [62]. The winning family of models for input and intrinsic connectivity was selected from the BMS results across all sessions and scanners. For each subject, the BMA was calculated from the models within the winning input and intrinsic connectivity family. By using BMA to estimate parameter values, we can glean how much relevant information is contained in the reduced model space of the two-node method by comparing parameter estimates between the two approaches. Parameter values were compared using analysis of variance (ANOVA). 119

121 7.4 Results fmri group activations One subject did not attend any of the second session scans and so was omitted. Statistically significant activations in all three regions of interest considered were found in 10 of the 11 remaining subjects. This left data for 10 subjects across 6 different scans, i.e. 2 sessions at each of the 3 centres used BMS results The input family BMS results across all scanners and sessions for both he two- and three-node approaches are shown in figure 7.2A. For both approaches the DPC is the most likely input region. Additionally, the twonode system which compares PAR with PMC shows slightly more evidence for PMC. The BMS results for the three-node approach also show PMC as the second most likely input family. This result is repeated across both sessions as show in figure 7.2B, although the ranking of models changes slightly. The DPC is still the most likely input region in both sessions when they are considered separately. Again the twonode approach BMS results match those for the three-node system. When BMS is considered for each centre separately, as shown in figure 7.2C, the results are largely consistent. DPC is still the most likely input family in centres 1 and 2, but the model selection for centre 3 is split between DPC and PMC. This is also true for both the two- and three-node model selection results. 120

122 Figure 7.2: The two- vs three-node input family group BMS results using RFX approach for A) all sessions and centres, B) session 1 compared with session 2, C) comparing centres individually. For each set of graphs (A,B and C), the top row graph is for three-node system and the bottom row is for the 3 two-node system using the same nodes. 121

123 The intrinsic connectivity family BMS results favoured a fully connected model for both approaches, for every session and centre (see supplementary material) Free energy correlation results The reliability between sessions and scanners was further probed by investigating the degree of correlation between free energy approximations. This is shown for the three-node approach in figure 7.3, and the two-node approach in figure 7.4. The free energies are highly correlated between sessions, and between scanners. 122

124 Figure 7.3: Figures showing the correlation between free energies that are averaged across subjects for each session and for each centre. In every case there is a very high correlation coefficient (>0.99, except for between centres 1 and 3, which was slightly less) indicating a high degree of agreement in the model rankings. 123

125 Figure 7.4: Figures showing the correlation between free energies that are averaged across subjects for each session and for each centre in the three-node approach. In every case there is a very high correlation coefficient (>0.99, except for between centres 1 and 3, which was slightly less) indicating a high degree of agreement in mode rankings 124

126 7.4.4 BMA results It is not possible to directly compare BMS results between groups therefore the parameter values obtained through BMA were compared. In this way the variability in the parameter estimates across sessions, centres and subjects can be identified. Given there was slightly more variation in BMS across sessions and centres for the input families, as opposed to the intrinsic connectivity families, average parameter values for inputs were considered. Graphs for intrinsic connectivity can be found in supplementary material (figure 7.8). BMA was performed to compute average parameter values in both the twoand three-node approaches. In both cases the intrinsic connectivity was considered to be fully connected, and models were averaged within the winning input family. As the intrinsic connectivity was set to the fully connected model, this means the models that were averaged in each instance differed only in the bilinear terms i.e. modulatory effects. Average input parameter estimates for both approaches are shown in figure 7.5. Parameter values were entered into a repeated measure ANOVA. For both approaches, there was found to be no significant difference between input parameter values across sessions (p=0.34 two-node; p=0.35 three-node) or centres (p=0.86 two-node; p=0.64 three-node), and no significant interaction of scanner and session (p=0.61 two-node; p=0.82 three-node). Parameter values were also entered into an ANOVA to look at the effect of subject, and for both approaches there was a significant difference (p=0.021 two-node; p=0.014 three-node). 125

Figure 7.5: The top two rows shows input parameter values for the winning input family in each of the 3 two-node systems at the group level for each subject, centre and session.

127 Figure 7.5: The top two rows shows input parameter values for the winning input family in each of the 3 two-node systems at the group level for each subject, centre and session. The different coloured bars correspond to the parameter estimates from the 3 different two-node systems; Blue DLPFC & PPC; green DLPFC & PMC; dark red PPC & PMC.. The bottom two rows show the input parameter value for the winning input family (DLPFC only) for the three node system at the group level, for each subject, centre and session. Error bars are the standard error of the mean. From a quick by eye look at figures 7.5 and 7.8 (supplementary material), it is evident that the ratio between parameter estimates is very similar between the two- and three-node approaches. To investigate this further, BMA obtained parameter values for the connectivity between regions determined 126

128 by matrices A and B, were compared between the two and three-node approaches as shown in figure 7.6. The input parameters are not included because they are not directly comparable, as the 3 two-node systems give 2 input parameters for each region, whereas the connectivity between regions produces one parameter per connection in both approaches. There is a high linear correlation (Pearson s r) between the two- and three-node parameter values (r=0.968), but a low to medium correlation between the standard deviations (r=0.372). This high correlation (>0.85) holds true when each individual session is considered separately, as shown in figure 7.6B. Figure 7.6: For both the two-node and three-node approaches: Connectivity parameters (intrinsic and modulatory). A) Averaged across session and centre, and the standard deviations across session and centre. B) Shown separately for each individual session, C=centre, S=session, e.g. C2 S2 refers to centre 2, session

129 7.5 Discussion This study is the first to address the reliability of bilinear DCM across multiple centres and sessions. It also provides important validation for the two-node model inference approach. The two-node approach, whilst only estimating 1.6% of the models needed for the three-node approach, managed to correctly identify the most likely model families and gave the same parameter estimates compared to the three-node approach. The computational advantage of the two-node approach is significant, both in terms of time and computer memory. The results show that the most likely input family of models as determined by BMS across all sessions and centres is input to DPC. This holds true when sessions are considered separately and when centres are considered separately, it holds true at two out of the three centres. Further investigation of centre 3 reveals that at the second session, the input switches from DPC to PMC, thus in five out of the six visits in which the task was performed, input to DPC was considered most likely. The most likely intrinsic connectivity family of models, as determined by BMS across all sessions and centres, is the fully connected one. This holds true across all sessions, centres and even subjects (see supplementary material). We can state therefore, that results of DCM are repeatable on the basis that in most cases BMS selects the same input family as the most likely, and in all cases it selects the intrinsic connectivity family as being fully connected. This is consistent with previous studies that have demonstrated high reproducibility of model selection results using bilinear [229] and stochastic [234] DCM respectively. One particular issue that is raised is why BMS selected a different input as being most likely in the second session at centre 3, particularly as the intrinsic connectivity results are so robust. Would one necessarily expect the same model to be selected on multiple occasions of a complex cognitive task 128

130 being performed? Bernal-Casas et al [234] found robust selection of the same winning model at three different centres for a also for the n-back task. In their study different subjects did the task at different centres whereas in our study each subject did the task at every centre. Given that the task has been repeated on multiple occasions, it is likely that there is a strong temporal effect due to repeated performance of a difficult task, and the likely development of different cognitive strategies. Unfortunately, despite the design of the scanning schedule, the actual visit times of the subjects to the different centres meant it was difficult to separate any temporal effects from centre effects, as there is not an even distribution and certain centres are heavily weighted for certain visit numbers for both visits. Repeated performance of working memory tasks is known to correlate with improved performance of the trained task as well as untrained transfer tasks, with which there are corresponding neural changes [238, 239]. Adequate control of cognitive strategy in neuroimaging tasks like the n-back task is difficult to achieve [240], and strategic shifts between multiple sessions are likely. However, despite this fact, the correlation of free energies and the repeatability of parameter estimates (as shown in figure) both suggest that the inferences made with DCM are repeatable between different scanning centres and multiple sessions. The fact that possible temporal effects have not been controlled in most ideal manner is limitation with this study. In order to rule out the possibility that any variation in BMS results between visits was attributable to any effect of centre or session we looked at the parameter values for the input obtained via BMA. Schuyler et al [233] have previously found that DCM parameter values showed high reliability across sessions. Our results are consistent with this finding as parameter values are not significantly different between centres and sessions. A similar finding was reported by Bernal-Casal et al [234] for stochastic DCM as they found that parameter values obtained at different centres were not significantly different. As show in figure 7.5, it is clear that there is a large amount of variability between individual subjects as compared to between sessions or centres. The results of the ANOVA show that there is no significant difference 129

131 between parameter values in different sessions or centres, but that there is a significant difference between parameter values in subjects. One can infer from this finding that the small amount of variability in the BMS results may be attributed to individual subjects. Although scanning centre or session appear to have no significant effect on parameter estimates, the high degree of variability between subjects raises an interesting question as to how appropriate group mean parameter estimates are. We can be confident from the results presented here that network inference is very robust, but care should be taken when making inferences on parameter values. For this reason it always recommended that BMS should be used as a first step in any DCM analysis [62], as any variability within network structure can be accounted for. The network inferred from the BMS results here shows a very high probability for both input and intrinsic connectivity, and so the variability seen between subjects in parameter estimates must be assumed to be inherent and represent natural variability between individuals. One should therefore be cautious when interpreting group mean parameter estimates. However, many group DCM studies have found significant difference between patient groups and controls [87], and so group mean parameter estimates can be informative. Previous investigations into the reliability of BMS results have used relatively small model spaces compared to those used for this study. By estimating all possible models within a three-node network, we had the unique opportunity to explore the reliability of BMS across the whole model space. This is not a trivial point since previous reliability studies can only be confident that results are repeatable within the confines of the constrained model space they have chosen [229, 233, 234]. This may be particularly relevant for large model spaces in which the best model criterion can become less reliable, and the results may critically depend on the model set chosen [84]. It is therefore conceivable that this may become more pronounced when comparing data obtained under different scanning conditions. Our results suggest that BMS is highly reliable between different scanning centres and sessions, regardless of the model set chosen. Although this cannot be explicitly tested as there are an infinite number of possible sets, figure 7.3 shows that there is 130

132 a very high correlation between the free energy model evidence approximations between sessions and centres. A high correlation is equivalent to a high amount of agreement between the ranking of models, which is important because structure inference ultimately depends on selecting the most likely model or family in a set. We are confident that centre, and therefore scanner choice, plays no significant role in determining the results of a DCM analysis, and that this is true for the entire model space, although it is still desirable to partition larger model sets into families to accommodate some of the inherent variability in model structure [62]. However, care must be taken when interpreting results obtained over multiple sessions, as to possible temporal effects. To test this we propose a study in which the same task is repeated over multiple sessions, evenly spaced in time, at one centre, in order to examine exactly how changing cognitive strategies over time may influence the results of a DCM analysis. In addition to the general reliability findings of the study with regards to DCM, we have shown that the two-node approach is also reliable (figure 7.4). This has been shown by the fact that BMS selects the same most likely family of models in the both two- and three-node approaches i.e. input to dorsolateral prefrontal cortex and reciprocal connections between regions. No inferences have been made about modulations of connections because we recommend that the two-node approach be used to infer input and intrinsic connectivity, and then inference on modulations, either in terms of model structure or parameter values proceed in the traditional manner in the reduced model space (Whittaker et al. in prep)(paper 1 in this thesis). Figure 7.6 demonstrates that the parameter estimates obtained from the twonode approach do not match those of the three-node approach exactly, as the two node estimates are scaled by a factor of approximately This is to be expected as the prior distributions on connectivity parameters is a function of the number of connections. The exact parameter values are never important in a DCM analysis and will differ from study to study as they depend entirely on the data; it is the ratios between parameters that is critical. Figure 7.6 shows that there is a very high degree of correlation between the parameter estimates from the two approaches (r=0.968), but a 131

133 much weaker correlation between the standard deviations of the parameter estimates (r=0.372). This is understandable as three-node approach estimates a greater number of models and so the precision of the estimates will be much better. However the high correlation between estimates suggests that the two-node approach is still robust enough to obtain parameter estimates sufficiently close to the estimates of the three node approach albeit scaled differently. The high correlation between two- and three-node parameter estimates is shown to still be present at each individual session (figure 7.6B), although with correlation coefficients slightly lower, which is to be expected given that there is less data. However it remains to be established, how many subjects/sessions are required in order to achieve a particular level of agreement (correlation) between parameter estimates. That the results of the two-node approach are identical to those obtained from the three-node, shows that the two-node approach is capable of capturing the most important source of variance associated with the task, and is not vulnerable to centre or session effects any more so than the computationally intensive three-node method. This finding shows that the two-node approach is a very promising method for using DCM in a more data driven way by effectively examining much larger areas of model space but without the computational burden of actually estimating the entire model space. 132

134 7.6 Supplementary material Figure 7.7: The two- vs three-node intrinsic family group BMS results using RFX approach for A) all sessions and centres, B) session 1 (B1) compared with session 2 (B2), C) comparing centres individually (C1, C2, and C3). For each set of graphs (A,B, and C), the top row graph is for three-node systems and the bottom row is for the 3 two-node systems. 133

135 Figure 7.8: The top two rows shows intrinsic connectivity parameter values for the winning intrinsic connectivity family in each of the 3 two-node systems at the group level for each subject, centre and session. The bottom two rows show the intrinsic connectivity parameter value for the winning intrinsic connectivity family from the three-node system at the group level, for each subject, centre and session. Each different coloured bar represents a different parameter of the 6 total intrinsic connectivity parameters in the 3 node system. Error bars are the standard error of the mean. 134

136 8 Paper 3 Abnormal Effective Connectivity During Emotional Face Processing in Depression Authors Joseph Whittaker (1) Shane McKie (1) Bill Deakin (1) Ian M Anderson (1) Rebecca Elliott (1) Affiliations Neuroscience and Psychiatry Unit, University of Manchester (1) 8.1 Abstract Negative emotion processing bias, a possible characteristic of depression, has been demonstrated in face recognition tasks and is thought to be caused by abnormal limbic activity. In this study we use a novel method model space estimation for Dynamic Causal Modelling to explore the effective connectivity between amygdala, fusiform gyrus, and inferior occipital gyrus in the right hemisphere, during emotional processing in a face recognition task. We find abnormal effective connectivity in a group of currently depressed subjects in response to both happy and sad faces. Increased connectivity modulated by sad faces is normalised following treatment with citalopram, a selective serotonin reuptake inhibitor (SSRI). Increased connectivity modulated by happy faces in healthy controls, is absent in depressed subjects both before 135

137 and after citalopram treatment. This implies it is a potential trait dependent feature which could be a biomarker for vulnerability to depression. 136

138 8.2 Introduction Major Depressive Disorder (MDD) is characterised by deficits in emotional processing and perception. Emotional face recognition paradigms that demonstrate this are well represented in the literature [141]. In particular, affective processing biases such as disproportionate recollection of negative facial emotions [241, 242] or a tendency to judge ambiguous or neutral facial emotions as negative [182, 183, 243] have been found, although not consistently [141, 159]. There is substantial evidence from functional magnetic resonance imaging (fmri) studies that the amygdala is implicated in the processing of negative expressions [151, 244] with significant differences between normal and impaired emotional processing in both healthy and depressive subjects. Many studies have focussed on the amygdala s role in the processing of fear [185, 245], although more recent findings suggest that the amygdala may be sensitive to emotional salience irrespective of valence [172]. Several studies have found increased amygdala activation in MDD patients as compared to controls in response to sad faces [193, 195, 207], and one group in particular have consistently found increased amygdala activation in response to fearful faces [196, 246, 247], although this finding has failed to be replicated in other studies [191, 248, 249]. Although the literature gives mixed and sometimes contradictory results, it seems that hyperactivity of the amygdala in MDD patients in response to negative emotions, but especially sadness, is the most prominent finding [151]. Emotional processing biases have been shown in some cases to persist during remission [183, 250, 251], and so may represent trait abnormalities or persistent markers of prior illness (scarring effects), which could indicate vulnerability to relapse [252]. There are several studies that demonstrate how differences in brain activity in patients with MDD may revert to normative patterns following treatment with antidepressants [191, 196, 207, 253]. 137

139 Thus, abnormal activation of specific brain regions in patients with MDD during processing of emotional faces has been thoroughly explored in the literature. However, in spite of the growing appreciation of the importance of connectivity between regions in the understanding of psychiatric disorders [126], there are still relatively few studies looking at how impairments in emotional processing may be mediated by dysfunctional connectivity [95]. In fmri, connectivity analyses measure either functional connectivity, in which simple correlations between regions are used to infer connectivity; or effective connectivity, in which directed influences between regions are modelled. The majority of connectivity analyses in MDD patients during emotional face processing concern functional connectivity [151], and a consistent finding is reduced functional connectivity between the amygdala and other regions in MDD patients [169, 170, 254]. That the amygdala is implicated in abnormal connectivity in functional networks during emotional face processing seems evident from the literature. Exactly how the amygdala is influenced by, or influences other, brain regions during emotional face processing tasks requires effective connectivity analysis methods. Dynamic Causal Modelling (DCM) provides a measure of effective connectivity. To date there have been very few studies using DCM to analyse face processing in MDD. Almeida et al [98] used DCM to determine that a sample of mainly female, medicated, MDD patients could be distinguished from controls by reduced left hemisphere effective connectivity between orbitomedial prefrontal cortex and amygdala in response to happy faces. Goulden et al [95] used DCM to compare 28 models in a sample of patients in remission from MDD (rmdd), and reported abnormal modulation of cortico-limbic effective connectivity compared with controls. To our knowledge, there have been no previous studies that have attempted to measure the differences in effective connectivity between MDD patients and controls, before and after treatment with antidepressants. There have been several extensions to DCM in recent years. By far the simplest and most widely used form is the, originally proposed, bilinear model of neuronal interactions [6]. A detailed guide to interpreting DCM 138

140 results for clinicians and non-technical readers can be found [255], but we will briefly review the basic principles here DCM In DCM, changing neuronal activity in each brain region of a network is determined by an external driving input to the system, connections between the regions, and changes in the connectivity strengths that are context dependent. This is explicitly modelled with equation 8.1. i z A uib z Cu Equation 8.1 Where z are time series vectors of the neural state of each region, and u are time series vectors of the different experimental factors of a stimulus presentation. The matrix A, which is the intrinsic connectivity, describes the steady state connectivity of the system. The matrix B describes how the connectivity changes based on an experimental manipulation i. The matrix C describes the effect of each experimental manipulation on each brain region, i.e. the driving input. This model of neuronal activity is then fed into a haemodynamic model in order to produce a modelled BOLD response for each region. The parameters of the model are then estimated using a Bayesian procedure whereby essentially, parameters are set to ensure the modelled BOLD signal is maximally similar to the measured BOLD signal. The estimation procedure yields both parameter values and an approximation to the model evidence, based on the free energy criterion. The free energy approximation to model evidence gives the accuracy of the model, corrected for the complexity to prevent over-fitting. Model evidences can be compared using Bayesian model selection (BMS). 139

141 One of the most important steps in a DCM analysis is carefully defining the model space [62]. Whilst the literature may provide sufficient evidence for the formation of plausible hypotheses that allow one to constrain the model space, there is always the possibility that one may be drawing incorrect conclusions by ignoring relevant connections. A review by Seghier et al [87] lists all the studies published at the time of writing, where DCM has been used to identify abnormal effective connectivity in psychiatric patients. Of the 28 studies included, the largest number of models compared was 48 [229], and in exactly half the studies only a single model was used. The problem with DCM model spaces is they are astronomical in size, which make exhaustive searches impossible [91, 92]. Recent data-driven network identification methods for DCM have been proposed [101, 103]. These methods rely on an approximation to model evidence that can be obtained without the need to invert all but the most complex model in a set [102]. This allows large numbers of models to be scored by omitting the computationally demanding step of inverting them. However it is still limited by the amount of models that can be stored in computer memory, and as the number of models rapidly grows with each additional region, whole model space searches are still intractable for relatively small networks. Here we compare 81 models per subject for a 3 node system, using a methodology which we have previously developed (Whittaker et al, in preparation for submission)(paper 1 in this thesis), in which the 3 node system is broken down into three separate 2 node systems. In this way, the entire model space can be indirectly searched, and the most likely model inferred from the results of sub-systems. We have shown that Bayesian Model Selection (BMS) selects the same most likely family of models in our method as it does when estimating the entire model space for the 3 nodes, but at a significant computational advantage. Thus we are effectively removing any a priori assumptions about the most likely model and therefore using DCM in a more exploratory fashion. In this paper, this novel approach to BMS with DCM has been used to analyse a group of initially unmedicated MDD patients during an implicit emotional face processing task, before and after treatment with citalopram. 140

142 This study takes an unprecedented look at the relationship between depression and treatment in effective connectivity during emotional processing. Previous DCM studies have only looked at medicated patients [98, 256], or else have looked at unmedicated patients [96, 257], but not looked at changes following antidepressant treatment. To our knowledge this is the only DCM study that directly explores the effect of antidepressants on effective connectivity during cognition. A previous finding with this dataset has been published, showing that the increased bilateral amygdala activation in patients during the viewing of sad faces compared to neutral is normalised to control levels following treatment [191]. In this paper we expand upon the previous findings by looking at how these regional activation differences are mediated by directed influences between face processing regions. A network of three regions was chosen; inferior occipital gyrus, fusiform gyrus, and amygdala. The neural basis of face perception has been hypothesised to consist of a core system, which encodes invariant features of faces, and an extended limbic and prefrontal system, which encodes the varying aspects of facial recognition such as emotion [173, 258]. We have chosen a network containing fusiform gyrus and inferior occipital gyrus as brain regions from the core system, and the amygdala, a limbic structure from the extended system known to have a role in emotional processing, and as previously discussed is heavily implicated in MDD. Thus we investigated how depression modulates connectivity between elements of the core and extended systems. These regions have previously been identified as important in the processing of emotional faces [258, 259] and differences between the MDD and HC group in their interactions may be important in explaining differences in how emotional valence modulates activity in the core face processing areas. 141

143 8.3 Methods Subjects Recruitment was undertaken at the University of Manchester (UK), and all participants gave informed consent, and were compensated for their time. All patients were screened with the Structural Clinical Interview for DSM-IV Axis I disorders. Potential patients were excluded from the study if they had a concurrent comorbid axis I psychiatric disorder or primary cluster A or B axis II disorder. Additionally, any participant with a neurological disorder, unstable medical condition, and history of significant head trauma, lifetime history of alcohol or substance abuse was excluded. Healthy controls with a family history of psychiatric disorders were also excluded. Patients were all non-medicated and met DSM-IV criteria for unipolar depression. Their illness severity was assessed using the Montgomery- Åsberg Depression Rating Scale (MADRS) as well as the seven-item Clinical Anxiety Scale adapted from the Hamilton Anxiety Scale. Inclusion required patients to have a MADRS score 20. A total of 35 currently depressed patients (MDD), and 24 healthy controls were included in the analysis (Table 8.1). Table 8.1: Demographics and clinical characteristics of un-medicated currently depressed patients (MDD) and healthy controls (HC). M=number of males/total number of subjects in group 142

144 8.3.2 Antidepressant treatment Of the 35 patients included in the analysis, 31 were treated with citalopram following their initial scan and returned for a second scan 8 weeks later. All the patients had received a stable dose of citalopram for at least 4 weeks prior to the second scan at a dose determined by their response to treatment at 4 weeks. Adherence to the drug treatment was verified by measurement of citalopram plasma levels at the time of the second scan. Following the 8 week citalopram treatment, MADRS scores for the patients reduced on average by points (range of 2-33), and all but 7 achieved remission (MADRS 10). To control for potential temporal or test-retest variability, 14 of the controls were also rescanned 8 weeks after the initial scan Implicit emotional processing faces task Subjects were presented with an implicit emotional processing faces task, which has previously been described in detail in the supplementary material of Arnone et al [191]. The task followed a block design where several emotional faces were presented to the participants. There were six faces (half male) presented from a standardised series of facial expressions [260] consisting of neutral, happy, sad and fearful emotions. Subjects had to identify the gender of the face via a button press on a scanner compatible response box fmri data acquisition A 1.5T Philips Intera MRI scanner, with a single-shot echo-planar (EPI) pulse sequence (TR=2.1 s, TE=40 msec), was used to acquire T2*-weighted images at the Welcome Trust Clinical Research Facility at the University of 143

145 Manchester. All data were analysed using SPM8. Group level regions of interest were identified with a small-volume-corrected family-wise error threshold set at p<0.05. Significant activations were found in the inferior occipital gyrus (IOG; MNI coordinates = 42, -77, -10), the fusiform gyrus (FG; MNI coordinates = 48, -49, -20), and the amygdala (AMY; MNI coordinates = 32, 0, -20) using a faces versus rest contrast DCM analysis A DCM analysis was performed using DCM8 within SPM8. The significant group activations in the brain regions identified in the group level analysis were used as the basis for ROI extraction in individual subjects. The three regions selected were; inferior occipital gyrus (IOG), fusiform gyrus (FG), and the amygdala. For each subject, the local maxima were found within 14mm of these group activations using a threshold of p<0.05 uncorrected. Data were then then extracted from a 6mm sphere around the local maxima for each subject. For simplicity DCMs were estimated using data extracted from the right hemisphere only. Model estimation was performed to identify the most likely model structure using a procedure previously outlined (Whittaker et al, submitted). Models were estimated for three separate two node systems, with the nodes being selected from the total set of three nodes, as shown in figure

146 Figure 8.1: Schematic illustration of the two- and three-node model spaces and their respective sizes. The total number of models possible for a 2 node system, allowing for modulation by one factor only on the connections between nodes, is 9 per input. With 3 possible input scenarios, this brings the total number of models estimated for each 2 node system to 27. With a set of three 2 node systems, this brings the total number of models estimated per subject to 81. In each model the faces were used as direct input to the system, and each of the emotions were considered as being modulatory factors separately. As detailed in the previous paper, the input to the complete 3 node system can be inferred from the aggregate input results from the three 2-node systems, and the intrinsic connectivity for the total 3 node system can be inferred from the intrinsic connections determined by the 2-node systems. This approach was used to identify the intrinsic connectivity of the network and the input location. Bayesian Model Averaging (BMA) [84] was then used to average over all possible modulations of the network, within each group, with each modulatory factor treated separately. BMA uses models within a set to calculate average parameter values, with each model s contribution being weighted by its model evidence. This reduces uncertainty about model structure by pooling information across multiple models. Average parameter 145

147 values for each group were then compared using classical statistical approaches within R. 8.4 Results BMS Bayesian Model Selection (BMS) was used to compare the estimated models and identify the most likely model structure. Models were grouped into families [84] according to input location and intrinsic connectivity, and compared using RFX BMS to find the most likely model structure across both groups, and across the two visits as shown in figure 8.2. The most likely input family in both systems in which it occurred was IOG (p>0.99) therefore on aggregate the most likely input is IOG, and thus it can be inferred to be the most likely input for the complete 3-node system. In every 2-node system, the most likely intrinsic connectivity family was fully connected (p>0.99). From this result we inferred that the complete 3-node system has a fully connected intrinsic connectivity (matrix A). Given this intrinsic connectivity structure, and with the input known, for each emotion condition in the task there are a total of 64 possible models.i.e. all combinations of the emotion modulating the intrinsic connectivity (matrix B). 146

148 Figure 8.2: BMS input family (top row) and intrinsic connectivity family (bottom row) results using the RFX approach BMA BMA was performed on the 64 modulation models for each subject and for each emotion to create weighted average parameter values for intrinsic connectivity and modulation of that connectivity by emotion. One-sample t- tests were used to test the null hypothesis that intrinsic connectivity and modulation parameters in the MDD and HC groups had a mean equal to zero, and unpaired two-sample t-tests were used to test the null hypothesis that there is no difference between the mean values in each group. This was done for all 3 emotions, and p-values were FDR adjusted [261] to test for multiple comparisons within each emotion. The results of these tests for the sad and happy emotion are given in table 8.2. The fear emotion is not included as there were no significant differences between any of the parameters. In the happy emotion there is no difference in the intrinsic connectivity parameters between MDD and HC, but there is a difference in the 147

149 modulation of the AMY to IOG connection as shown in figure 8.3. There is a significant increase in the connectivity between AMY and IOG in response to happy faces in the HC group that is not present in the MDD group. In the sad emotion there is no difference in the intrinsic connectivity parameters between MDD and HC, but there is a difference in the modulation of the FG to AMY connection as shown in figure 8.3. There is a significant increase in the connectivity between FG and AMY in response to sad faces in MDD group that is not present in the HC group. 148

150 Table 8.2: Mean and standard deviation for intrinsic and modulatory connectivity parameters for MDD and HC groups for happy and sad emotion. One sample t-tests were used to test the significance of parameters, and two-sample t-tests were used to test for significant differences between MDD and HC. Within each emotion condition p-values were FDR adjusted. 149

151 Figure 8.3: Parameter values for modulations of intrinsic connections by emotions A) Change in connectivity between AMY to IOG in response to happy faces. B) Change in connectivity between FG to AMY in response to sad faces. Values are group means with standard error Effect of Citalopram treatment Having identified the connections in which there was a difference between the groups in modulation by happy and sad emotion respectively, we then investigated how treatment with citalopram in the MDD group affected these modulatory parameters using a t-test to test the null hypothesis that there was no difference between groups in the second visit, shown in table 8.3. For both parameters there was no significant difference between the groups at the second visit or between controls at visit 1 and the treated depressed at visit 2. To test the null hypothesis that there was no difference in the modulation of connectivity between sessions in the MDD group, pairwise t-tests were performed, shown in table 3. In the happy emotion condition, there is no difference in the modulation of the AMY to IOG connection in the MDD group. In the sad emotion condition, there is a difference (p<0.01) in the modulation of the FG to AMY connection. 150

152 Table 8.3: Significant differences between groups in visit 1 inferred with two-sample t-tests as already listed in table 2 and FDR adjusted. Post-hoc tests; two-sample t-tests for between groups in visit 2, and pairwise t-test for within groups, FDR adjusted. To summarise, there was a significant difference between groups in the sad faces modulation of the connection from FG to AMY in the first visit, but not the second visit, and there was a significant difference between the modulation strength between the first and second visits in the MDD group. This suggests an abnormal modulation of the connection from FG to AMY in the MDD group that is normalised following treatment. There was a significant difference between groups in the happy faces modulation of the connection from AMY to IOG in the first visit but not the second visit. However, there was no significant difference between the modulation strength between the first and second visits in the MDD group. This inconsistency is probably attributable to a lack of statistical power attributed to the small numbers in the HC group for the second visit. This result suggests that the MDD group have an abnormally low modulation of the connection from AMY to IOG that is not normalised following treatment, but that a lack of statistical power in the control group means this is not reflected in the statistics. Removing the subjects who did not attend both sessions would allow did not change the results of the analysis, and was deemed undesirable given the large number of MDD subjects whose data would be wasted. 151

Table 1. Summary of PET and fmri Methods. What is imaged PET fmri BOLD (T2*) Regional brain activation. Blood flow ( 15 O) Arterial spin tagging (AST)

Table 1. Summary of PET and fmri Methods. What is imaged PET fmri BOLD (T2*) Regional brain activation. Blood flow ( 15 O) Arterial spin tagging (AST) Table 1 Summary of PET and fmri Methods What is imaged PET fmri Brain structure Regional brain activation Anatomical connectivity Receptor binding and regional chemical distribution Blood flow ( 15 O)