Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Similar documents
Bayesian Models for Combining Data Across Domains and Domain Types in Predictive fmri Data Analysis (Thesis Proposal)

Predictive fmri Analysis for Multiple Subjects and Multiple Studies (Thesis)

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Hidden Process Models

Classification and Statistical Analysis of Auditory FMRI Data Using Linear Discriminative Analysis and Quadratic Discriminative Analysis

Bayesian Inference. Thomas Nichols. With thanks Lee Harrison

Introduction to Computational Neuroscience

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

EECS 433 Statistical Pattern Recognition

Detecting Cognitive States Using Machine Learning

ST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods

Chapter 1. Introduction

Detection of Cognitive States from fmri data using Machine Learning Techniques

Effects of generative and discriminative learning on use of category variability

Cognitive Modeling. Lecture 12: Bayesian Inference. Sharon Goldwater. School of Informatics University of Edinburgh

Classifying instantaneous cognitive states from fmri data

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Modeling Temporal Structure in Classical Conditioning

Dynamic Causal Modeling

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

Shared Response Model Tutorial! What works? How can it help you? Po-Hsuan (Cameron) Chen and Hejia Hasson Lab! Feb 15, 2017!

Prediction of Successful Memory Encoding from fmri Data

Empirical assessment of univariate and bivariate meta-analyses for comparing the accuracy of diagnostic tests

Introduction to Computational Neuroscience

AUTOMATING NEUROLOGICAL DISEASE DIAGNOSIS USING STRUCTURAL MR BRAIN SCAN FEATURES

Training fmri Classifiers to Discriminate Cognitive States across Multiple Subjects

Sensory Cue Integration

CS/NEUR125 Brains, Minds, and Machines. Due: Friday, April 14

Functional MRI Mapping Cognition

Brain mapping 2.0: Putting together fmri datasets to map cognitive ontologies to the brain. Bertrand Thirion

5/20/2014. Leaving Andy Clark's safe shores: Scaling Predictive Processing to higher cognition. ANC sysmposium 19 May 2014

Bayesian hierarchical modelling

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Bayesian and Frequentist Approaches

A Bayesian Account of Reconstructive Memory

Shu Kong. Department of Computer Science, UC Irvine

Sum of Neurally Distinct Stimulus- and Task-Related Components.

Missing data. Patrick Breheny. April 23. Introduction Missing response data Missing covariate data

Analysis of left-censored multiplex immunoassay data: A unified approach

Group-Wise FMRI Activation Detection on Corresponding Cortical Landmarks

Finding the Augmented Neural Pathways to Math Processing: fmri Pattern Classification of Turner Syndrome Brains

Applications with Bayesian Approach

Bayesian methods in health economics

Classification on Brain Functional Magnetic Resonance Imaging: Dimensionality, Sample Size, Subject Variability and Noise

Bayesian Statistics Estimation of a Single Mean and Variance MCMC Diagnostics and Missing Data

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

arxiv: v2 [stat.ap] 7 Dec 2016

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Hybrid HMM and HCRF model for sequence classification

MACHINE LEARNING ANALYSIS OF PERIPHERAL PHYSIOLOGY FOR EMOTION DETECTION

Search e Fall /18/15

Bayesian Nonparametric Methods for Precision Medicine

Bayesian Joint Modelling of Benefit and Risk in Drug Development

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model

Contributions to Brain MRI Processing and Analysis

Conjoint Audiogram Estimation via Gaussian Process Classification

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

Machine learning for neural decoding

A Brief Introduction to Bayesian Statistics

Outlier Analysis. Lijun Zhang

Inference beyond significance testing: a gentle primer of model based inference

Learning from data when all models are wrong

AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES RISK IN PRIMARY CARE

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Shu Kong. Department of Computer Science, UC Irvine

Historical controls in clinical trials: the meta-analytic predictive approach applied to over-dispersed count data

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

Biostatistical modelling in genomics for clinical cancer studies

Knowledge Discovery and Data Mining I

Quantifying the attenuation of the ketamine phmri response in humans: a validation using antipsychotic and glutamatergic agents.

WHAT DOES THE BRAIN TELL US ABOUT TRUST AND DISTRUST? EVIDENCE FROM A FUNCTIONAL NEUROIMAGING STUDY 1

A Race Model of Perceptual Forced Choice Reaction Time

Introduction to Computational Neuroscience

2012 Course : The Statistician Brain: the Bayesian Revolution in Cognitive Science

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Meta-analysis of few small studies in small populations and rare diseases

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Error Detection based on neural signals

Gianluca Baio. University College London Department of Statistical Science.

Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment. Berkes, Orban, Lengyel, Fiser.

Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification

A hierarchical modeling approach to data analysis and study design in a multi-site experimental fmri study

Technical Specifications

Predicting the Brain Response to Treatment using a Bayesian Hierarchical Model. with Application to a Study of Schizophrenia

Lecture 11: Clustering to discover disease subtypes and stages

Bayesian Bi-Cluster Change-Point Model for Exploring Functional Brain Dynamics

Motivation: Fraud Detection

BayesOpt: Extensions and applications

Bayesian Face Recognition Using Gabor Features

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

Small-area estimation of mental illness prevalence for schools

HST 583 fmri DATA ANALYSIS AND ACQUISITION

AClass: A Simple, Online Probabilistic Classifier. Vikash K. Mansinghka Computational Cognitive Science Group MIT BCS/CSAIL

Discovering Inductive Biases in Categorization through Iterated Learning

Using mixture priors for robust inference: application in Bayesian dose escalation trials

Model reconnaissance: discretization, naive Bayes and maximum-entropy. Sanne de Roever/ spdrnl

Transcription:

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007

Outline Motivation and Thesis Preliminary results: Hierarchical Gaussian Naive Bayes Proposed work, including schedule 2

3D images of hemodynamic activations in the brain assumed to be correlated with local neural activations ~10,000 spatial features (voxels, analogous to pixels) Temporal component fmri ~10-100 trials 3

fmri Data Analysis Descriptive Locations of activations correlated with a cognitive phenomenon Most common paradigm used Predictive Prediction of the cognitive phenomenon underlying brain activations Classification of cognitive tasks, prediction of levels of stimulus presence (EBC competition) 4

Motivation: Subject- Level For predictive analysis, analysis is done separately for individual subjects Problem: lack of training examples, can potentially improve performance by incorporating data from other subjects Simple solution: pool the data for all the subjects together Problem: for some subjects, might not be reasonable to pool data (e.g. subjects with different conditions) Problem: inter-subject variability is ignored 5

Inter-Subject Variability Human brains have similar functional structures, but there are differences in shapes and volumes (different feature spaces for different human subjects) Normalization to a common space is possible, but can result in the distortion of the data Even after normalization, the activations are also governed by personal experience, and affected by environment 6 Thirion et al. (2006)

Motivation: Study-Level fmri studies are expensive; it is desirable to incorporate data from existing similar studies Problem: problems from the subject-level Problem: variability due to different experimental conditions (e.g. the use of different stimuli, different magnetic field strength) Problem: which studies are similar 7

How much commonality exists across different individuals with respect to a particular cognitive task Influence how much can be shared across different individuals (or groups) Motivation: Generalization Example: sharing for classification of picture vs sentence might be easy, but sharing for classification of orientation of visual stimuli using V1/V2 voxels might be hard 8 Kamitani and Tong Nature Neuroscience, 2005

Thesis Machine learning and statistical techniques to Combine data from multiple subjects and studies Improve predictive performance (compared to separate analyses for individual subjects and studies) Distinguish common patterns of activations versus subject-specific or study-specific patterns of activations Framework of choice is Bayesian statistics, in particular hierarchical Bayesian modeling Offer a principled way to account for uncertainties and the different levels of data generation involved 9

Related Work in fmri Classification Pooled data from multiple subjects (Wang et al. (2004), Davatzikos et al. (2005), Mourao-Miranda et al. (2006)) Group analysis: multiple subjects in a specific study Focus: descriptive, increase in sensitivity for detection of activations Mixed-effects model (Woods (1996), Holmes and Friston (1998), Beckmann et al. (2003)) Hierarchical Bayes model (Friston et al. (2002)) 10

Related Work in ML/ Statistics Multitask learning/inductive transfer Caruana (1997) Generative setting: Rosenstein et al. (2005), Roy and Kaelbling (2007) 11

Preliminary Work Combining data from multiple subjects in a given study Extension of the Gaussian Naive Bayes classifier The use of hierarchical Bayes modeling Designed for data after feature space normalization Simplify the problem, even though not ideal 12

Gaussian Naive Bayes (GNB) Bayesian classifier: pick the class with maximum class posterior probability (proportional to product of class prior and class-conditional probability of the data) c = argmax c k Naive Bayes: independence of features conditional on the class P(C = c k y) argmax c k P(C = c k )p(y C = c k ) P(y C) = J P(y j C) j=1 Gaussian Naive Bayes: for each feature j, the classconditional distribution is Gaussian y j C = c k N (θ (k) j,(σ (k) j ) 2 ) 13

GNB, Learning Use maximum likelihood (sample mean and sample variance) ˆθ (k) s j ( ˆσ (k) s j )2 = 1 n s 1 = 1 n s y (k) s ji n s i=1 n s (y (k) s ji ˆθ (k) s j )2 i=1 s: subject j: feature i: instance k: class For pooled data, aggregate the data over all the subjects (estimates will be the same for all subjects) 14

Hierarchical Normal Model For each class and each feature µ, τ θ 1 θ 2 θ s θ S y s1 y s2 y sns 15

Hierarchical Normal Model The tool to extend the Gaussian Naive Bayes classifier to handle multiple subjects Gelman et al. (2005), also used in Friston et al. (2002) for group analysis (aim: hypothesis testing) Modeling Gaussian data for different but related groups; the means for each group has a common Gaussian distribution Generative model: y si N (θ s,σ 2 ) θ s N (µ,τ 2 ) 16 µ τ θ y n s s s: group (subject) i: instance σ

Hierarchical GNB (HGNB) Use the hierarchical normal model as a class-conditional generative model for each feature, as a way to integrate data from multiple subjects Assume data has been normalized to a common space Same variance for all subjects Estimate variance separately, taking the median of sample variances for all the subjects 17

MAP, Empirical Bayes estimates that (approximately) maximize the marginal likelihood (the probability of data given hyperparameters) µ MP = 1 S S y s s=1 τ 2 MP= 1 S 1 S s=1 (y s µ MP ) 2 MP: point estimate s: subject µ, τ maximum of the posterior of θs conditional on the data and the hyperparameters θ s = n s σ 2 y s + 1 τ 2 MP µ MP n s σ 2 + 1 τ 2 MP θ 1 θ 2 θ s θ S y s1 y s2 y sns When the number of examples is small, HGNB behaves like GNB on pooled data When the number of examples is large, HGNB behaves like GNB on the individual subject s data 18

+ - *

It is not true that the plus is above the star.

Datasets Starplus Classification of the types of first stimuli (picture or sentence) given a window of fmri data Spatial normalization: use average of voxels in each region of interest (ROI) Feature selection: use ROI for visual cortex 16 features (each time point is a feature) 20 trials per class per subject 13 subjects 22

hammer

palace

Datasets Twocategories Classification of the category of word (tools or dwellings) given a window of fmri data Spatial normalization: use transformation to a common brain template (MNI template) Feature selection: 300 voxels ranked using Fisher s LDA 300 features (averaged over time) 42 trials per class per subject 6 subjects 27

Experiment Iterate over the subjects, designating the current one as the test subject 2-fold cross-validation, varying the number of training examples used from the test subject for each class; fold randomly chosen (repeated several times) GNB indiv: GNB learned using data from the test subject only GNB pooled: GNB learned using data from the test subject and the other subjects (assuming no inter-subject variability) HGNB using data from the test subject and the other subjects 28

Classification Accuracies, Starplus 0.85 0.8 classification accuracies classification accuracies 0.75 0.7 0.65 0 2 4 6 8 10 12 no of training examples per class no of training examples per class 29 GNB indiv GNB pooled HGNB

Classification Accuracies, Twocategories 0.7 0.68 0.66 classification accuracies classification accuracies 0.64 0.62 0.6 0.58 0.56 0.54 GNB indiv GNB pooled HGNB 0.52 0 5 10 15 20 25 no of training examples per class no of training examples per class 30

HGNB Recap Classifier to combine data across multiple subjects in a study Improvement in predictive performance over separate analyses and pooling data Assume that each cognitive task to predict generates similar brain activations on all the subjects Show that hierarchical Bayes modeling can model intersubject variability 31

Proposed Work Goals that have not been addressed by HGNB: 1. sharing across studies, or both subjects and studies 2. determining groups to share 3. determining cross-subject/study commonality of particular cognitive tasks (related to generalisability) 4. dealing with the distortion caused by normalization Work proposed to address the above goals: Variations on HGNB Latent structure in data Accounting for normalization 32

Variations on HGNB Goals (1st and 2nd) sharing across studies, or both subjects and studies determining groups to share Variation/extension of the HGNB classifier 33

Sharing Across studies: use the hierarchical normal model to model cross-study variations Across subjects and studies: Add another level of the hierarchy (study -> subject -> data or subject -> study -> data) Independent models for subjects and studies y s (m) i N ( f (θ s,ξ m ),σ 2 ) θ s N (µ,τ 2 ) ξ m N (α,β 2 ) 34

Determining Groups to Share More reasonable to share across some subjects than others (e.g. subjects with similar clinical conditions) Also across some studies than others (not as useful to share data from a study on the visual system and data from a language study) Automatically determine grouping Clustering, mixture model Dirichlet process mixture model 35 y si N (θ s,σ 2 ) θ s N (µ (k),(τ (k) ) 2 ) k Multinomial(π 1,,π K ) y si N (θ s,σ 2 ) θ s N (µ s,τ 2 ) µ s G G DP(α,G 0 ) s: subject i: instance k: class

Latent structure in data Goal (3rd): determining cross-subject/study commonality of particular cognitive tasks (related to generalisability) Assume there are latent factors underlying the data, with a lot fewer factors than voxels Determine commonality by looking at the shared latent factors If the information for a certain cognitive task is shareable among a certain group of subjects and/or studies, there will be common factors for the elements of the group Dimensionality reduction, sparsity 36

West (2003) Sparse Factor Regression Similar to (probabilistic) factor analysis or PCA, with a regression component x i =Bλ i + ν i y i =θ λ i + ε i k factors, (k << p), k determined in advance xi: i-th instance of data (px1) yi: i-th response (scalar) λi: factor for i-th instance (kx1) B: data factor loading (pxk) θ: response factor loading (1xk) νi: data noise for i-th instance εi: response noise for i-th instance Sparsity assumption on the factor loading matrix B For testing, assume the corresponding y to be missing data 37

Sparse Factor Regression for fmri The images share a common factor loading matrix B (even for different subjects and studies) θ indicates which factors are relevant for prediction (can add sparsity prior for θ) Allow θ to be different for different subjects and different studies Shareability is determined by how many non-zero elements of θ are shared How many factors to use? May use the Indian buffet process (Griffiths and Ghahramani, (2006)) as a prior, which can also facilitate sparsity of factors 38

Topics Can think of latent factors in terms of topics in a topic model (e.g. Latent Dirichlet Allocation (LDA), Blei et al. (2003)) LDA: A document is a mixture of topics LDA for fmri data: A topic specifies a distribution over words A brain activation image is a mixture of latent factors A latent factor specifies a distribution over voxel activations 39

LDA for fmri Data Sparsity: each latent factor determines the distribution for only a subset of the voxels Because each image is a mixture of latent factors, shareability is determined by the number of predictive latent factors shared Details need to be worked out 40

Accounting for Normalization Goal (4th): dealing with the distortion caused by normalization Incorporate the uncertainties introduced by normalization in the prediction or analysis Approach: probabilistic voxel correspondence 41

Probabilistic voxel correspondence Probabilistic model for normalization Model the correspondence among voxels across different brains Use a probabilistic atlas as a prior Available from the International Consortium for Brain Mapping (ICBM) Incorporate information about the brain structure (available from structural images) A lot still needs to be investigated 42

Schedule December 2007: variations on HGNB and latent structure in fmri data variations on HGNB sparse factor regression formulate topic model for fmri December 2008: accounting for normalization probabilistic voxel correspondence 43