Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

Size: px

Start display at page:

Download "Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience."

Barry Lane
6 years ago
Views:

1 Outline: Motivation. What s the attention mechanism? Soft attention vs. Hard attention. Attention in Machine translation. Attention in Image captioning. State-of-the-art. 1

2 Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience. ØTo reduce the computational burden of processing high dimensional inputs by selecting to only process subsets of the input. ØTo allow the system to focus on distinct aspects of the input and thus improve the generated output. 2

3 Motivation: Attention models in Deep Neural Networks 3

4 Motivation: Attention models in Deep Neural Networks 4

5 Encoder - Decoder systems An encoder decoder framework is a general framework based on neural networks that aims at handling the mapping between highly structured input and output. 5

6 Encoder - Decoder systems Machine Translation: Encoder Decoder 6

7 Encoder - Decoder systems Image Captioning: Encoder Decoder 7

8 What is an attention model? Context q The Z is a weighted arithmetic mean of the y i q The weights are chosen according to the relevance of each y i given the context c. input For example in image captioning : The context: h i The y i are the representations of the parts of the image And the output z is a representation of the attended image. 8

9 Attention model in general: But what is exactly this black box doing? 9

10 Attention model in general: C is the context, and the y i are the «part of the data» we are looking at. 10

11 Attention model in general: m " = F %&& y ", C (1) 11

Attention model in general: s " = /01 (2 3 ) 6 578 /01 (2 5 ) (2) " s " =1 (3) So the

12 Attention model in general: s " = /01 (2 3 ) /01 (2 5 ) (2) " s " =1 (3) So the softmax can be thought as the max of the «relevance» of the variables, according to the context. 12

13 Attention model in general: : Z = ";< s " y " (4) The output Z is the weighted arithmetic mean of all the y i where the weight (s i ) represent the relevance for each variable according to the context C.. 13

14 Attention model in general: 14

15 Attention model in general: Attention Model with another measure of relevance (F ATT in Eq. 1). 15

16 Soft Attention vs. Hard Attention ü The mechanism we described previously is called «Soft attention». ü It is a fully differentiable deterministic mechanism that can be plugged into an existing system. ü The gradients are propagated through the attention mechanism at the same time they are propagated through the rest of the network. q Hard attention is a stochastic process: Instead of using all the hidden states as an input for the decoding, the system samples a hidden state y i with the probabilities s i. q In order to propagate a gradient through this process, we estimate the gradient by Monte Carlo sampling. q Needs Reinforcement learning. L 16

17 17

sentence, the encoder produces <<k>> hidden states h j each corresponding to a word.

18 1- Attention in Machine Translation [1] Without the Attention model: With Just the one Attention hidden state model: <<h>> goes through the decoder v Instead of producing just a single hidden state corresponding to the whole sentence, the encoder produces <<k>> hidden states h j each corresponding to a word. (Here, h j is y i and c t = h i t-1 In Eq.1) [1]- Bahdanau et al, Neural Machine Translation by Jointly Learning to Align and Translate, ICLR

Learning to Align in Machine Translation The attention process can be seen as an alignment, because the network usually learns to focus on a single input word each time it produces

19 Learning to Align in Machine Translation The attention process can be seen as an alignment, because the network usually learns to focus on a single input word each time it produces an output word. The attention weights during the translation process, which reveals the alignment and makes it possible to interpret what the network has learnt. Figure from [1]. 19

20 2- Attention for Image captioning [2] v The only difference between this model and the model of machine translation is the input y i. v We select the «relevant» part of the image by using h i as the context. [2]- Xu et al, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, ICML

21 Show, Attend and Tell [2]: Visualizing the attention weights 21

22 22

23 DRAW: 23

Recurrent Attentional Networks for Saliency Detection- CVPR 2016 Recursive Recurrent Nets with Attention Modeling for OCRin the Wild- CVPR

24 Attention is everywhere! Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering. arxiv2016. Stacked Attention Networks for Image Question Answering - CVPR 2016 Attention to Scale: Scale-aware Semantic Image Segmentation- CVPR 2016 Recurrent Attentional Networks for Saliency Detection- CVPR 2016 Recursive Recurrent Nets with Attention Modeling for OCRin the Wild- CVPR 2016 Recurrent Attention Models for Depth-Based Person Identification- CVPR 2016 Generating Sequences with Recurrent Neural Networks. --> And many more 24

25 Thanks for your ATTENTION! J 25

Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta

Image Captioning using Reinforcement Learning Presentation by: Samarth Gupta 1 Introduction Summary Supervised Models Image captioning as RL problem Actor Critic Architecture Policy Gradient architecture