Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition Charles F. Cadieu, Ha Hong, Daniel L. K. Yamins, Nicolas Pinto, Diego Ardila, Ethan A. Solomon, Najib J. Majaj, James J. DiCarlo Presented by Te-Lin Wu and William Shen

Outline Introduction and Motivations Definition of Terms Methods Data Experimental Results Take Aways and Conclusions

Motivations Are representational performances of Deep Neural Networks (DNNs) matching that of Visual IT Cortex (on object recognition)? Can we understand primate visual processing through DNNs?

Introduction Primate Vision: Performing remarkably even with constraints The key: Cortical Ventral Stream creates Representation! Ventral Stream transforms non-linear object recognition into neural representation separate object based on categories: ConvNet fc7!? Series of recapitulated modules of non-linear transformations Bio-inspired models How close are they to human brains?

Human Visual Cortex

From V1 to IT

4 advantages of this work Corrected experimental limitations (Noise, number of recorded neural sites) Measure the accuracy of a representation as a function of complexity Variations in the model/neural spaces relevant to object classification Hung CP, Kreiman G, Poggio T, DiCarlo JJ (2005) Fast readout of object identity from macaque inferior temporal cortex. Science 310: 863 866. Rust NC, DiCarlo JJ (2010) Selectivity and tolerance ( invariance ) both increase as visual information propagates from cortical area V4 to IT. Journal of Neuroscience 30: 12978 12995. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, et al. (2014) Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences 111: 8619 8624. Kriegeskorte N, Mur M, Ruff DA, Kiani R, Bodurka J, et al. (2008) Matching Categorical Object Representations in Inferior Temporal Cortex of Man and Monkey. Neuron 60: 1126 1141. Kriegeskorte N, Mur M, Bandettini P (2008) Representational Similarity Analysis Connecting the Branches of Systems Neuroscience. Frontiers in Systems Neuroscience 2. Yamins D, Hong H, Cadieu CF, DiCarlo JJ (2013) Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream. Advances in Neural Information Processing Systems0020: 3093 3101. Larger Dataset than previous related works Rust NC, DiCarlo JJ (2010) Selectivity and tolerance ( invariance ) both increase as visual information propagates from cortical area V4 to IT. Journal of Neuroscience 30: 12978 12995.

Outline Introduction and Motivations Definition of Terms Methods Data Experimental Results Take Aways and Conclusions

Definitions Single-unit recording Multi-unit recording V1-Like: Models that try to capture first order account of primary visual cortex V2-Like: Corresponds to visual area V2, includes non-linearity and averaging HMAX: Bio-inspired hierarchical model utilizing sparse localized features HMO: Hierarchical Modular Optimization, combination of ConvNets, adaptive boosting procedure for hyper-parameters tuning AlexNet Zeiler and Fergus: Visualizing and Understanding Convolutional Networks

Outline Introduction and Motivations Definition of Terms Methods Data Experimental Results Take Aways and Conclusions

Kernel Analysis To measure the accuracy of a representation as a function of complexity Definition of a good representation? Performance: Leave-one-out generalization error ( = Regularization parameter) looe( ) = 1 / as the complexity 1- looe( ) as the precision

Kernel Analysis cont d Procedure: A learning problem p(x,y), a set of n data points independently drawn Representation x (x), x:images, y: normalized labels Compute the Kernel Matrix using Gaussian Kernel Solve the regression problem: For a fixed σ and : Obtain the solution:

Other methods Adding a level of noise to the model representations that matched the noise generated from the neural representations Linear SVM utilizing the representations Representation Similarity: Representation Dissimilarity Matrix: Relationship between two RDMs

Outline Introduction and Motivations Definition of Terms Methods Data Experimental Results Take Aways and Conclusions

Dataset 1960 tested images Image Variations: Object exemplar Geometric transformations (Position / Scale / Rotation / Pose) Background

Neural Data Collection Collected from V4 and IT using multi-electrode array Image presentation: one for 100ms Multi-unit representations Raw ring rates by counting number of spikes Subtract the background firing rate Normalization Take the mean across the repetitions of each image Single-unit representations Spike-Sorting Isolated 160 single-units from IT 95 single-units from V4

Outline Introduction and Motivations Definition of Terms Methods Data Experimental Results Decoding analysis Encoding analysis Take Aways and Conclusions

Decoding--Kernel Analysis Result Comparison between different machine representation: DNN performs significantly better than other neural biology inspired models

Decoding--Kernel Analysis Result IT Cortex After sub-sampling to same feature number, we can see that DNN s representation has comparable performance as that of IT cortex (both in multi-unit and single-unit).

Decoding--At different feature sample numbers

Decoding--Linear SVM

Outline Introduction and Motivations Definition of Terms Methods Data Experimental Results Decoding analysis Encoding analysis Take Aways and Conclusions

Encoding--Predicting IT Responses Interestingly although DNN s representations have much better decoding ability than V4 cortex s; DNN has similar performance in encoding task as V4 cortex.

Encoding-- Representational Dissimilarity Matrices (RDMs)

Encoding--Similarity to IT Dissimilarity Matrix IT-fit means adding a linear transform on model representation to predict IT representation We can see that without IT-fit, DNN s representation is very different from IT cortex representation; upon linear transform, DNN s representation falls within noise range of IT cortex. This further proves that although there remains a gap between DNN models and IT cortex representation, DNN representation has the encoding power to form IT cortex s representation.

Limitations: Viewing Time: 100ms Passive Viewing vs. Active Task Performance Visual Experience and Learning (Macaque lacks experience with a number of classes) Energy Efficiency Comparison (model energy requirements are 2 to 3 orders of magnitude higher than the primate visual system) Natural Primate Development vs 15M Labeled Images

Outline Introduction and Motivations Definition of Terms Methods Data Experimental Results Decoding analysis Encoding analysis Take Aways and Conclusions

Conclusion DNN s representational performance is comparable with that of IT cortex!

Questions?

Thanks!