CS-E Deep Learning Session 4: Convolutional Networks

Similar documents
Understanding Convolutional Neural

A convolutional neural network to classify American Sign Language fingerspelling from depth and colour images

CSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil

COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks

Efficient Deep Model Selection

Rich feature hierarchies for accurate object detection and semantic segmentation

Flexible, High Performance Convolutional Neural Networks for Image Classification

A HMM-based Pre-training Approach for Sequential Data

PMR5406 Redes Neurais e Lógica Fuzzy. Aula 5 Alguns Exemplos

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

Convolutional Neural Networks (CNN)

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

Object recognition and hierarchical computation

Hierarchical Convolutional Features for Visual Tracking

Do Deep Neural Networks Suffer from Crowding?

Convolutional Neural Networks for Text Classification

Convolutional Neural Networks for Estimating Left Ventricular Volume

Using stigmergy to incorporate the time into artificial neural networks

Auto-Encoder Pre-Training of Segmented-Memory Recurrent Neural Networks

Object Detectors Emerge in Deep Scene CNNs

Supplementary method. a a a. a a a. h h,w w, then the size of C will be ( h h + 1) ( w w + 1). The matrix K is

Multi-attention Guided Activation Propagation in CNNs

MORE than 15 years after the early studies in Affective

Networks and Hierarchical Processing: Object Recognition in Human and Computer Vision

Task 1: Machine Learning with Spike-Timing-Dependent Plasticity (STDP)

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention

arxiv: v1 [cs.lg] 4 Feb 2019

Audiovisual to Sign Language Translator

Visual interpretation in pathology

A Deep Learning Approach for Subject Independent Emotion Recognition from Facial Expressions

High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks

Y-Net: Joint Segmentation and Classification for Diagnosis of Breast Biopsy Images

Image Understanding and Machine Vision, Optical Society of America, June [8] H. Ney. The Use of a One-Stage Dynamic Programming Algorithm for

Less is More: Culling the Training Set to Improve Robustness of Deep Neural Networks

Segmentation of Cell Membrane and Nucleus by Improving Pix2pix

arxiv: v2 [cs.cv] 22 Mar 2018

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

A Novel Capsule Neural Network Based Model For Drowsiness Detection Using Electroencephalography Signals

Biologically-Inspired Human Motion Detection

Reading Assignments: Lecture 18: Visual Pre-Processing. Chapters TMB Brain Theory and Artificial Intelligence

Towards disentangling underlying explanatory factors Yoshua Bengio. July 13th, 2018 ICML 2018 Workshop on Learning with Limited Labels

Early Diagnosis of Autism Disease by Multi-channel CNNs

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

QUANTIFICATION OF PROGRESSION OF RETINAL NERVE FIBER LAYER ATROPHY IN FUNDUS PHOTOGRAPH

Patients EEG Data Analysis via Spectrogram Image with a Convolution Neural Network

Recognizing Complex Mental States With Deep Hierarchical Features For Human-Robot Interaction

A CONVOLUTION NEURAL NETWORK ALGORITHM FOR BRAIN TUMOR IMAGE SEGMENTATION

Question 1 Multiple Choice (8 marks)

Recognition of sign language gestures using neural networks

Differences of Face and Object Recognition in Utilizing Early Visual Information

Skin cancer reorganization and classification with deep neural network

Automatic Classification of Perceived Gender from Facial Images

Synthesis of Gadolinium-enhanced MRI for Multiple Sclerosis patients using Generative Adversarial Network

CS4495 Computer Vision Introduction to Recognition. Aaron Bobick School of Interactive Computing

A Selective Attention Based Method for Visual Pattern Recognition

Using.a Saliency Map for Active Spatial Selective Attention: Implementation & Initial Results

Facial expression recognition with spatiotemporal local descriptors

Convolutional and LSTM Neural Networks

Validating the Visual Saliency Model

Reduction of Overfitting in Diabetes Prediction Using Deep Learning Neural Network

Summary and discussion of: Why Does Unsupervised Pre-training Help Deep Learning?

Computational Cognitive Neuroscience

Neuromorphic convolutional recurrent neural network for road safety or safety near the road

Learning Classifier Systems (LCS/XCSF)

Attentional Masking for Pre-trained Deep Networks

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

Why did the network make this prediction?

DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

Shu Kong. Department of Computer Science, UC Irvine

Convolutional and LSTM Neural Networks

Principals of Object Perception

Visual semantics: image elements. Symbols Objects People Poses

A Perceptron Reveals the Face of Sex

Satoru Hiwa, 1 Kenya Hanawa, 2 Ryota Tamura, 2 Keisuke Hachisuka, 3 and Tomoyuki Hiroyasu Introduction

Analysis of in-vivo extracellular recordings. Ryan Morrill Bootcamp 9/10/2014

An Auditory System Modeling in Sound Source Localization

ACUTE LEUKEMIA CLASSIFICATION USING CONVOLUTION NEURAL NETWORK IN CLINICAL DECISION SUPPORT SYSTEM

Highly Accurate Brain Stroke Diagnostic System and Generative Lesion Model. Junghwan Cho, Ph.D. CAIDE Systems, Inc. Deep Learning R&D Team

Modeling the Deployment of Spatial Attention

The Impact of Visual Saliency Prediction in Image Classification

Perception & Attention. Perception is the best understood cognitive function, esp in terms of underlying biological substrates.

CS6501: Deep Learning for Visual Recognition. GenerativeAdversarial Networks (GANs)

B657: Final Project Report Holistically-Nested Edge Detection

Shu Kong. Department of Computer Science, UC Irvine

ITERATIVELY TRAINING CLASSIFIERS FOR CIRCULATING TUMOR CELL DETECTION

Lateral Geniculate Nucleus (LGN)

Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

THE data used in this project is provided. SEIZURE forecasting systems hold promise. Seizure Prediction from Intracranial EEG Recordings

Representational similarity analysis

Chapter 1. Introduction

Using Deep Convolutional Networks for Gesture Recognition in American Sign Language

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

Hierarchical Conflict Propagation: Sequence Learning in a Recurrent Deep Neural Network

Speech recognition in noisy environments: A survey

Sign Language Recognition using Convolutional Neural Networks

A Brain Computer Interface System For Auto Piloting Wheelchair

Multilayer Perceptron Neural Network Classification of Malignant Breast. Mass

Transcription:

CS-E4050 - Deep Learning Session 4: Convolutional Networks Jyri Kivinen Aalto University 23 September 2015 Credits: Thanks to Tapani Raiko for slides material. CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 1 / 19

Some desiderata for pattern recognition systems (?) Stable and predictable response change to transforming (say shifted) input The ability to deal with varying-sized inputs (in terms of unit cardinality) and input patterns (in terms of their scale) Avoiding the need to learn properties that one can encode into the model easily Parsimonious representations. CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 2 / 19

Feed-forward networks for high-dimensional and highly variable content Imagine a set of 1000x1000 images (rather low-resolution images compared to what can be captured with even most mobile phone cameras these days) for training and to be analyzed by a regular feed-forward network, say an MLP. Suppose the model is to define a classifier, say classifying images to some predefined categories, like to interesting and to not interesting. One million (weight) connections per hidden unit in the first layer, even just for gray-scale images. Without e.g. parameter sharing, could well be a really hard problem to scale onto; e.g. too many parameters to fit well the data one has (curse of dimensionality). CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 3 / 19

Feed-forward networks to high-dimensional and highly variable content Imagine a set of 1000x1000 images (rather low-resolution images compared to what can be captured with even most mobile phone cameras these days) for training and to be analyzed by a regular feed-forward network, say an MLP. Suppose the model is to define a classifier, say classifying images to some predefined categories, like to interesting and to not interesting. How to deal with test images of (even slightly) different size? How would the unit activations be for an image and the image shifted? Good approaches for handling potential issues? CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 4 / 19

Convolutional (deterministic) feed-forward networks: On the main ingredients Grid-arrangement and position encoding: units in the input layer retain and hidden unit layer(s) are arranged to have stacked-grid-topology with co-ordinated position indexing. Local connectivity: each hidden unit receives input only from (its input layer) units with position within its so-called receptive field. Parameter sharing: Hidden unit layers consist of multiple grids of hidden units called feature planes, and parameters are shared within a feature plane; can be used to obtain translation equivariance (and then e.g. stability and predictability of response change due to shifted input). Pooling; used e.g. to obtain local translation invariance (e.g. to allow local spatial input deformations), reduce dimensions. CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 5 / 19

Convolutional networks: from full to local connections (example, credit: Tapani Raiko) CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 6 / 19

Convolutional networks: parameter sharing (example, credit: Tapani Raiko) CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 7 / 19

Convolutional networks: first layer without pooling (example, credit: Tapani Raiko) CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 8 / 19

Convolutional networks: first layer with pooling (example, credit: Tapani Raiko) CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 9 / 19

Convolutional networks: full network (example, credit: Tapani Raiko) CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 10 / 19

Convolutional vs. non-convolutional (Slide [credit]: Tapani Raiko) Number of weights (ignoring biases): 5 5 9 + 5 5 9 16 + 7 7 16 10 = 225 + 3600 + 7840 = 11665 Sizes of signals h: 28 28, 28 28 9, 14 14 16 = 784, 7056, 3136 Compare to an example non-convolutional example: Weights: 28 28 225 + 225 144 + 144 10 = 176400 + 32400 + 1440 = 210240 Signals: 784, 225, 144 Convolutional network has more signals but less params. Could we scale the networks to the 1000x1000 images? Would/should some of their properties change? CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 11 / 19

Some example application input types (Credit: Tapani Raiko) Tensor Single channel Multi-channel 1-D Raw audio (mono) Motion capture 2-D Audio + (S.T.) Fourier transform Game of Go 3-D Brain imaging Colour video CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 12 / 19

Some possible connectivity, parameter sharing, and pooling function variants Parameter sharing patterns per a convolutional layer: No sharing of biases Spatial parameter sharing patterns (e.g. tiled-convolutional) Input connectivity patterns of units within a feature plane: Overlapping (regular-convolutional) Non-overlapping (tiled-convolutional) Fixed pooling function variants: Linear functions: average-pooling Non-linear functions: max-pooling CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 13 / 19

But why called convolutional? As often in any F-F neural networks, the hidden units can be seen take as their input a linear combination of the activations by the units connecting to them (bias unit fixed to one, other units); they first linearly transform their input data, and then apply the activation function to get the unit activation/state. Implementing the linear transform computations per layer could be implemented in multiple ways. Computing convolutional-layer unit inputs can be computed via convolution/cross-correlation of the activations of the connecting-unit values by filter-kernels defined by the feature-plane specific weights (/filters). Highly parallellizable; don t even think about sliding windows! (Fast) Fourier-transform (FFT) - based approaches may be inconvenient/problematic (e.g. boundary handling issues). CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 14 / 19

Convolution-operator (Slide from Tapani Raiko) (w x)[t] = a w[a]x[t a] CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 15 / 19

Handling boundaries in convolutional layers Border units may need special treatment. Some boundary handling implementation variants, when doing the convolutions: Padd the inputs when computing the outputs: by zeroes or e.g. from input data: e.g. wrap-around borders Can be used to retain fixed grid size from input to output Don t padd inputs when computing the outputs: Each output unit receives the same number of connections from the input data; output grid dimensions are smaller. CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 16 / 19

Home exercises Familiarize with the Conv. Neural Networks (LeNet) tutorial: http: //deeplearning.net/tutorial/lenet.html. Reported task: Implement and experiment with gradient descent - based training of a convolutional (feed-forward) network; details on the next slide. CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 17 / 19

Home exercises Reported task: Implement and experiment with gradient descent - based training of a convolutional (feed-forward) network: Choose the details: data, network structure, etc., except: have at least one convolutional layer; have pooling. Experiment with the approach providing the following visualization: objective function evaluated on the training data as a function of the number of training epochs; could also plot further things such as learned network parameters, unit activations (as a response to some data), (other) important training diagnostics. Provide a description of your approach and the results in the report (providing the visualization(s) and most important lines of code). CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 18 / 19

Materials, references Gradients to inputs: Zeiler, Matthew D., and Rob Fergus. Visualizing and understanding convolutional networks. In Proc., European Conference on Computer Vision (ECCV), 2014. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86 (1998) 22782324 LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), (1989) 541-551 Waibel, A., Hanazawa, T., Hinton, G.E., Shikano, K., Lang, K.: Phoneme recognition using Time Delay Neural Networks. IEEE Trans. ASSP 37 (1989) 328339 CS-E4050 - Deep Learning Session 4: Convolutional Networks 23 September 2015 19 / 19