Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta
|
|
- Ronald Collins
- 5 years ago
- Views:
Transcription
1 Image Captioning using Reinforcement Learning Presentation by: Samarth Gupta 1
2 Introduction Summary Supervised Models Image captioning as RL problem Actor Critic Architecture Policy Gradient architecture Conclusion 2
3 Introduction Caption: Describing an image in words What are the applications? 3
4 Applications 4
5 What do we need? Datasets MSCOCO ~ 100k images + 5 captions/image Flickr30k ~ 30k images + 5 captions/image Flickr8k ~ 8k images + 5 captions/image Evaluation BLEU scores BLEU-1, BLEU-2, BLEU-3, BLEU-4 Meteor CIDEr Model 5
6 BLEU Scores Generated Caption: <start> I can cat <end> Given a ground truth caption and a generated caption for the corresponding image, BLEU-n score is the percentage of the number of matching n-grams 6
7 Previous approaches : Caption Generation through object detection and language models These models were very limited in their approach 2014: Encoder Decoder framework Image captioning as a machine translation problem 2017: Image captioning as a reinforcement learning problem 7
8 Image captioning as machine translation Good Afternoon! Guten Tag! Machine Translation is implemented using an encoder-decoder architecture A band is playing music on stage 8
9 Encoder Decoder framework Encoder A Convolutional Neural Network Decoder A Recurrent Neural Network 9
10 Encoder-Decoder with visual attention1 Encoder Any CNN Network Any pretrained CNN VGG16, GoogleNet Encoder: A CNN classifier Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International conference on machine learning
11 Encoder-Decoder with visual attention Decoder RNN network At each timestep of RNN, we predict one word Attention Allows the model to attend to specific features 11
12 Attention Unit yi - Image features C Context (word features) Soft Attention 12
13 13
14 14
15 Goal - Given an image I, generate a sentence S = {w1,w2,...,wt} which correctly describes the image content Image Captioning as RL problem At any timestep t, State Image features + words generated until t Action Next word to generate Reward Can be set in different ways We will look into two different architectures: Actor-Critic Policy Gradient 15
16 Policy Gradient Architecture Predicts policy by maximizing expected reward The method suffers from high variance One way of reducing the variance is to increase batch size. However, it would lead to inefficient learning Introduce a baseline 16
17 Introducing a baseline reduces variance in the policy gradient algorithm Acts like a critic to the model The goal is to find a good baseline for policy network Baseline 17
18 Actor-Critic Architecture Actor Generates a policy function Critic Generates value for the given state Critic can be thought of as a moving baseline for the policy network Actor and Critic are two separate models which are trained simultaneously 18
19 Actor-Critic Model2 Actor Policy Network Predicts the next word Critic Value Network Evaluates the reward Train embedding network (rewards) Train policy network and value network Train actor-critic together as RL problem Ren, Zhou, et al. "Deep reinforcement learning-based image captioning with embedding reward." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
20 Any model that generates a sentence as a sequence of words The encoder-decoder framework can work as a policy network Policy Network The policy network is trained using standard supervised learning with cross entropy loss 20
21 The embedding model is the used to predict the similarity between an image and a sentence Embedding model 21
22 The value network vp evaluates the reward r from an observed state st The value network is trained using supervised learning with MSE Value Network 22
23 Pretrain policy network with cross entropy loss Pretrain value network with mean squared loss Train policy network and actor network jointly using deep RL Training 23
24 Results 24
25 25
26 Self-Critical Sequence Training (SCST)3 Built on Policy gradient method Utilizes its test-time inference to estimate a baseline Uses evaluation metric (CIDEr) to estimate reward Rennie, Steven J., et al. "Self-critical sequence training for image captioning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
27 No need to estimate a reward signal (as is in the case of actor-critic model) Utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences Directly optimizes on the evaluation Metric (CIDEr score) Advantages 27
28 SCST Training Policy Network Image encoder Resnet Attention Decoder LSTM (1 layer, 512 units) Pretrain the model with supervised learning (XE loss) Train the model with Reinforcement Learning Reward CIDEr score Baseline Test time inference reward 28
29 Results MS Powerpoint: A picture containing grass, animal MS Powerpoint: A close up of a brick building 29
30 30
31 Conclusion Three types of Image captioning models Object detection + language model Encoder-Decoder framework with supervised learning Pretrained encoder-decoder with Reinforcement learning Actor-Critic Architecture Policy gradient Architecture Datasets MSCOCO, Flickr30k, Flickr8k Evaluation Metric BLEU, CIDEr, Meteor 31
32 References 1. Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International conference on machine learning Ren, Zhou, et al. "Deep reinforcement learning-based image captioning with embedding reward." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Rennie, Steven J., et al. "Self-critical sequence training for image captioning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.
Outline: Motivation. What s the attention mechanism? Soft attention vs. Hard attention. Attention in Machine translation. Attention in Image captioning. State-of-the-art. 1 Motivation: Attention: Focusing
More informationDiscriminability objective for training descriptive image captions
Discriminability objective for training descriptive image captions Ruotian Luo TTI-Chicago Joint work with Brian Price Scott Cohen Greg Shakhnarovich (Adobe) (Adobe) (TTIC) Discriminability objective for
More informationTraining for Diversity in Image Paragraph Captioning
Training for Diversity in Image Paragraph Captioning Luke Melas-Kyriazi George Han Alexander M. Rush School of Engineering and Applied Sciences Harvard University {lmelaskyriazi@college, hanz@college,
More informationarxiv: v3 [cs.cv] 23 Jul 2018
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data Xihui Liu 1, Hongsheng Li 1, Jing Shao 2, Dapeng Chen 1, and Xiaogang Wang 1 arxiv:1803.08314v3 [cs.cv] 23 Jul
More informationLOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES. Felix Sun, David Harwath, and James Glass
LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES Felix Sun, David Harwath, and James Glass MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA {felixsun,
More informationarxiv: v1 [stat.ml] 23 Jan 2017
Learning what to look in chest X-rays with a recurrent visual attention model arxiv:1701.06452v1 [stat.ml] 23 Jan 2017 Petros-Pavlos Ypsilantis Department of Biomedical Engineering King s College London
More informationSegmentation of Cell Membrane and Nucleus by Improving Pix2pix
Segmentation of Membrane and Nucleus by Improving Pix2pix Masaya Sato 1, Kazuhiro Hotta 1, Ayako Imanishi 2, Michiyuki Matsuda 2 and Kenta Terai 2 1 Meijo University, Siogamaguchi, Nagoya, Aichi, Japan
More informationarxiv: v1 [cs.cv] 12 Dec 2016
Text-guided Attention Model for Image Captioning Jonghwan Mun, Minsu Cho, Bohyung Han Department of Computer Science and Engineering, POSTECH, Korea {choco1916, mscho, bhhan}@postech.ac.kr arxiv:1612.03557v1
More informationMedical Knowledge Attention Enhanced Neural Model. for Named Entity Recognition in Chinese EMR
Medical Knowledge Attention Enhanced Neural Model for Named Entity Recognition in Chinese EMR Zhichang Zhang, Yu Zhang, Tong Zhou College of Computer Science and Engineering, Northwest Normal University,
More informationTranslating Videos to Natural Language Using Deep Recurrent Neural Networks
Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan UT Austin Huijuan Xu UMass. Lowell Jeff Donahue UC Berkeley Marcus Rohrbach UC Berkeley Subhashini Venugopalan
More informationVector Learning for Cross Domain Representations
Vector Learning for Cross Domain Representations Shagan Sah, Chi Zhang, Thang Nguyen, Dheeraj Kumar Peri, Ameya Shringi, Raymond Ptucha Rochester Institute of Technology, Rochester, NY 14623, USA arxiv:1809.10312v1
More informationRecurrent Neural Networks
CS 2750: Machine Learning Recurrent Neural Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2017 One Motivation: Descriptive Text for Images It was an arresting face, pointed of chin,
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 12: Time Sequence Data, Recurrent Neural Networks (RNNs), Long Short-Term Memories (s), and Image Captioning Peter Belhumeur Computer Science Columbia University
More informationAggregated Sparse Attention for Steering Angle Prediction
Aggregated Sparse Attention for Steering Angle Prediction Sen He, Dmitry Kangin, Yang Mi and Nicolas Pugeault Department of Computer Sciences,University of Exeter, Exeter, EX4 4QF Email: {sh752, D.Kangin,
More informationConvolutional and LSTM Neural Networks
Convolutional and LSTM Neural Networks Vanessa Jurtz January 11, 2017 Contents Neural networks and GPUs Lasagne Peptide binding to MHC class II molecules Convolutional Neural Networks (CNN) Recurrent and
More informationIntelligent Machines That Act Rationally. Hang Li Bytedance AI Lab
Intelligent Machines That Act Rationally Hang Li Bytedance AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly
More informationComputational modeling of visual attention and saliency in the Smart Playroom
Computational modeling of visual attention and saliency in the Smart Playroom Andrew Jones Department of Computer Science, Brown University Abstract The two canonical modes of human visual attention bottomup
More informationSocial Image Captioning: Exploring Visual Attention and User Attention
sensors Article Social Image Captioning: Exploring and User Leiquan Wang 1 ID, Xiaoliang Chu 1, Weishan Zhang 1, Yiwei Wei 1, Weichen Sun 2,3 and Chunlei Wu 1, * 1 College of Computer & Communication Engineering,
More informationConvolutional and LSTM Neural Networks
Convolutional and LSTM Neural Networks Vanessa Jurtz January 12, 2016 Contents Neural networks and GPUs Lasagne Peptide binding to MHC class II molecules Convolutional Neural Networks (CNN) Recurrent and
More informationChittron: An Automatic Bangla Image Captioning System
Chittron: An Automatic Bangla Image Captioning System Motiur Rahman 1, Nabeel Mohammed 2, Nafees Mansoor 3 and Sifat Momen 4 1,3 Department of Computer Science and Engineering, University of Liberal Arts
More informationSimultaneous Estimation of Food Categories and Calories with Multi-task CNN
Simultaneous Estimation of Food Categories and Calories with Multi-task CNN Takumi Ege and Keiji Yanai The University of Electro-Communications, Tokyo 1 Introduction (1) Spread of meal management applications.
More informationIntelligent Machines That Act Rationally. Hang Li Toutiao AI Lab
Intelligent Machines That Act Rationally Hang Li Toutiao AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly Thinking
More informationUsing Deep Convolutional Networks for Gesture Recognition in American Sign Language
Using Deep Convolutional Networks for Gesture Recognition in American Sign Language Abstract In the realm of multimodal communication, sign language is, and continues to be, one of the most understudied
More informationBottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering SUPPLEMENTARY MATERIALS 1. Implementation Details 1.1. Bottom-Up Attention Model Our bottom-up attention Faster R-CNN
More informationHighly Accurate Brain Stroke Diagnostic System and Generative Lesion Model. Junghwan Cho, Ph.D. CAIDE Systems, Inc. Deep Learning R&D Team
Highly Accurate Brain Stroke Diagnostic System and Generative Lesion Model Junghwan Cho, Ph.D. CAIDE Systems, Inc. Deep Learning R&D Team Established in September, 2016 at 110 Canal st. Lowell, MA 01852,
More informationSkin cancer reorganization and classification with deep neural network
Skin cancer reorganization and classification with deep neural network Hao Chang 1 1. Department of Genetics, Yale University School of Medicine 2. Email: changhao86@gmail.com Abstract As one kind of skin
More informationAn Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns
An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns 1. Introduction Vasily Morzhakov, Alexey Redozubov morzhakovva@gmail.com, galdrd@gmail.com Abstract Cortical
More informationarxiv: v2 [cs.cv] 19 Dec 2017
An Ensemble of Deep Convolutional Neural Networks for Alzheimer s Disease Detection and Classification arxiv:1712.01675v2 [cs.cv] 19 Dec 2017 Jyoti Islam Department of Computer Science Georgia State University
More informationarxiv: v3 [cs.cv] 11 Aug 2017 Abstract
human G-GAN G-MLE human G-GAN G-MLE Towards Diverse and Natural Image Descriptions via a Conditional GAN Bo Dai 1 Sanja Fidler 23 Raquel Urtasun 234 Dahua Lin 1 1 Department of Information Engineering,
More informationDEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS
SEOUL Oct.7, 2016 DEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS Gunhee Kim Computer Science and Engineering Seoul National University October
More informationDeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation
DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation Biyi Fang Michigan State University ACM SenSys 17 Nov 6 th, 2017 Biyi Fang (MSU) Jillian Co (MSU) Mi Zhang
More informationAttention Correctness in Neural Image Captioning
Attention Correctness in Neural Image Captioning Chenxi Liu 1 Junhua Mao 2 Fei Sha 2,3 Alan Yuille 1,2 Johns Hopkins University 1 University of California, Los Angeles 2 University of Southern California
More informationConvolutional Neural Networks for Estimating Left Ventricular Volume
Convolutional Neural Networks for Estimating Left Ventricular Volume Ryan Silva Stanford University rdsilva@stanford.edu Maksim Korolev Stanford University mkorolev@stanford.edu Abstract End-systolic and
More informationA HMM-based Pre-training Approach for Sequential Data
A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University
More informationKeyword-driven Image Captioning via Context-dependent Bilateral LSTM
Keyword-driven Image Captioning via Context-dependent Bilateral LSTM Xiaodan Zhang 1,2, Shuqiang Jiang 2, Qixiang Ye 2, Jianbin Jiao 2, Rynson W.H. Lau 1 1 City University of Hong Kong 2 University of
More informationComparison of Two Approaches for Direct Food Calorie Estimation
Comparison of Two Approaches for Direct Food Calorie Estimation Takumi Ege and Keiji Yanai The University of Electro-Communications, Tokyo Direct food calorie estimation Foodlog CaloNavi Volume selection
More informationBuilding Evaluation Scales for NLP using Item Response Theory
Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged
More informationUnpaired Image Captioning by Language Pivoting
Unpaired Image Captioning by Language Pivoting Jiuxiang Gu 1, Shafiq Joty 2, Jianfei Cai 2, Gang Wang 3 1 ROSE Lab, Nanyang Technological University, Singapore 2 SCSE, Nanyang Technological University,
More informationTowards image captioning and evaluation. Vikash Sehwag, Qasim Nadeem
Towards image captioning and evaluation Vikash Sehwag, Qasim Nadeem Overview Why automatic image captioning? Overview of two caption evaluation metrics Paper: Captioning with Nearest-Neighbor approaches
More informationarxiv: v2 [cs.lg] 1 Jun 2018
Shagun Sodhani 1 * Vardaan Pahuja 1 * arxiv:1805.11016v2 [cs.lg] 1 Jun 2018 Abstract Self-play (Sukhbaatar et al., 2017) is an unsupervised training procedure which enables the reinforcement learning agents
More informationDeep Learning for Lip Reading using Audio-Visual Information for Urdu Language
Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language Muhammad Faisal Information Technology University Lahore m.faisal@itu.edu.pk Abstract Human lip-reading is a challenging task.
More informationCSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil
CSE 5194.01 - Introduction to High-Perfomance Deep Learning ImageNet & VGG Jihyung Kil ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
More informationSequential Predictions Recurrent Neural Networks
CS 2770: Computer Vision Sequential Predictions Recurrent Neural Networks Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 One Motivation: Descriptive Text for Images It was an arresting
More informationPOC Brain Tumor Segmentation. vlife Use Case
Brain Tumor Segmentation vlife Use Case 1 Automatic Brain Tumor Segmentation using CNN Background Brain tumor segmentation seeks to separate healthy tissue from tumorous regions such as the advancing tumor,
More informationHolistically-Nested Edge Detection (HED)
Holistically-Nested Edge Detection (HED) Saining Xie, Zhuowen Tu Presented by Yuxin Wu February 10, 20 What is an Edge? Local intensity change? Used in traditional methods: Canny, Sobel, etc Learn it!
More informationInferring Clinical Correlations from EEG Reports with Deep Neural Learning
Inferring Clinical Correlations from EEG Reports with Deep Neural Learning Methods for Identification, Classification, and Association using EHR Data S23 Travis R. Goodwin (Presenter) & Sanda M. Harabagiu
More informationChair for Computer Aided Medical Procedures (CAMP) Seminar on Deep Learning for Medical Applications. Shadi Albarqouni Christoph Baur
Chair for (CAMP) Seminar on Deep Learning for Medical Applications Shadi Albarqouni Christoph Baur Results of matching system obtained via matching.in.tum.de 108 Applicants 9 % 10 % 9 % 14 % 30 % Rank
More informationConvolutional Neural Networks for Text Classification
Convolutional Neural Networks for Text Classification Sebastian Sierra MindLab Research Group July 1, 2016 ebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 1 / 32 Outline 1 What is
More informationarxiv: v1 [cs.cv] 13 Mar 2018
RESOURCE AWARE DESIGN OF A DEEP CONVOLUTIONAL-RECURRENT NEURAL NETWORK FOR SPEECH RECOGNITION THROUGH AUDIO-VISUAL SENSOR FUSION Matthijs Van keirsbilck Bert Moons Marian Verhelst MICAS, Department of
More informationCSC2541 Project Paper: Mood-based Image to Music Synthesis
CSC2541 Project Paper: Mood-based Image to Music Synthesis Mary Elaine Malit Department of Computer Science University of Toronto elainemalit@cs.toronto.edu Jun Shu Song Department of Computer Science
More informationDeep Learning based Information Extraction Framework on Chinese Electronic Health Records
Deep Learning based Information Extraction Framework on Chinese Electronic Health Records Bing Tian Yong Zhang Kaixin Liu Chunxiao Xing RIIT, Beijing National Research Center for Information Science and
More informationObject Detectors Emerge in Deep Scene CNNs
Object Detectors Emerge in Deep Scene CNNs Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba Presented By: Collin McCarthy Goal: Understand how objects are represented in CNNs Are
More informationA Computational Model For Action Prediction Development
A Computational Model For Action Prediction Development Serkan Bugur 1, Yukie Nagai 3, Erhan Oztop 2, and Emre Ugur 1 1 Bogazici University, Istanbul, Turkey. 2 Ozyegin University, Istanbul, Turkey. 3
More informationArecent paper [31] claims to (learn to) classify EEG
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Training on the test set? An analysis of Spampinato et al. [31] Ren Li, Jared S. Johansen, Hamad Ahmed, Thomas V. Ilyevsky, Ronnie B Wilbur,
More informationarxiv: v2 [cs.cv] 10 Aug 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering arxiv:1707.07998v2 [cs.cv] 10 Aug 2017 Peter Anderson 1, Xiaodong He 2, Chris Buehler 2, Damien Teney 3 Mark Johnson
More informationUsing stigmergy to incorporate the time into artificial neural networks
Using stigmergy to incorporate the time into artificial neural networks Federico A. Galatolo, Mario G.C.A. Cimino, and Gigliola Vaglini Department of Information Engineering, University of Pisa, 56122
More informationHierarchical Convolutional Features for Visual Tracking
Hierarchical Convolutional Features for Visual Tracking Chao Ma Jia-Bin Huang Xiaokang Yang Ming-Husan Yang SJTU UIUC SJTU UC Merced ICCV 2015 Background Given the initial state (position and scale), estimate
More informationComparison of Two Approaches for Direct Food Calorie Estimation
Comparison of Two Approaches for Direct Food Calorie Estimation Takumi Ege and Keiji Yanai Department of Informatics, The University of Electro-Communications, Tokyo 1-5-1 Chofugaoka, Chofu-shi, Tokyo
More informationDeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation
DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation Jimmy Wu 1, Bolei Zhou 1, Diondra Peck 2, Scott Hsieh 3, Vandana Dialani, MD 4 Lester Mackey 5, and Genevieve
More informationInterpretable & Transparent Deep Learning
Fraunhofer Image Processing Heinrich Hertz Institute Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech Samek Northern Lights Deep Learning Workshop (NLDL 19) Tromsø,
More informationSmaller, faster, deeper: University of Edinburgh MT submittion to WMT 2017
Smaller, faster, deeper: University of Edinburgh MT submittion to WMT 2017 Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, Philip
More informationComputational Cognitive Neuroscience
Computational Cognitive Neuroscience Computational Cognitive Neuroscience Computational Cognitive Neuroscience *Computer vision, *Pattern recognition, *Classification, *Picking the relevant information
More informationIntroduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018
Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this
More informationNeuromorphic convolutional recurrent neural network for road safety or safety near the road
Neuromorphic convolutional recurrent neural network for road safety or safety near the road WOO-SUP HAN 1, IL SONG HAN 2 1 ODIGA, London, U.K. 2 Korea Advanced Institute of Science and Technology, Daejeon,
More informationAttend and Diagnose: Clinical Time Series Analysis using Attention Models
Attend and Diagnose: Clinical Time Series Analysis using Attention Models Huan Song, Deepta Rajan, Jayaraman J. Thiagarajan, Andreas Spanias SenSIP Center, School of ECEE, Arizona State University, Tempe,
More informationImproving the Interpretability of DEMUD on Image Data Sets
Improving the Interpretability of DEMUD on Image Data Sets Jake Lee, Jet Propulsion Laboratory, California Institute of Technology & Columbia University, CS 19 Intern under Kiri Wagstaff Summer 2018 Government
More informationB657: Final Project Report Holistically-Nested Edge Detection
B657: Final roject Report Holistically-Nested Edge Detection Mingze Xu & Hanfei Mei May 4, 2016 Abstract Holistically-Nested Edge Detection (HED), which is a novel edge detection method based on fully
More informationAutomated diagnosis of pneumothorax using an ensemble of convolutional neural networks with multi-sized chest radiography images
Automated diagnosis of pneumothorax using an ensemble of convolutional neural networks with multi-sized chest radiography images Tae Joon Jun, Dohyeun Kim, and Daeyoung Kim School of Computing, KAIST,
More informationMinimum Risk Training For Neural Machine Translation. Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu
Minimum Risk Training For Neural Machine Translation Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu ACL 2016, Berlin, German, August 2016 Machine Translation MT: using computer
More informationVision, Language, Reasoning
CS 2770: Computer Vision Vision, Language, Reasoning Prof. Adriana Kovashka University of Pittsburgh March 5, 2019 Plan for this lecture Image captioning Tool: Recurrent neural networks Captioning for
More informationarxiv: v1 [cs.cv] 7 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning Yapeng Tian 1, Chenxiao Guan 1, Justin Goodman 2, Marc Moore 3, and Chenliang Xu 1 arxiv:1812.02872v1 [cs.cv] 7 Dec 2018 1 Department of Computer
More informationMulti-attention Guided Activation Propagation in CNNs
Multi-attention Guided Activation Propagation in CNNs Xiangteng He and Yuxin Peng (B) Institute of Computer Science and Technology, Peking University, Beijing, China pengyuxin@pku.edu.cn Abstract. CNNs
More informationarxiv: v2 [cs.cv] 22 Mar 2018
Deep saliency: What is learnt by a deep network about saliency? Sen He 1 Nicolas Pugeault 1 arxiv:1801.04261v2 [cs.cv] 22 Mar 2018 Abstract Deep convolutional neural networks have achieved impressive performance
More informationarxiv: v4 [cs.cv] 1 Sep 2018
manuscript No. (will be inserted by the editor) Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond Dimitrios Kollias Panagiotis Tzirakis Mihalis A. Nicolaou
More informationEfficient Deep Model Selection
Efficient Deep Model Selection Jose Alvarez Researcher Data61, CSIRO, Australia GTC, May 9 th 2017 www.josemalvarez.net conv1 conv2 conv3 conv4 conv5 conv6 conv7 conv8 softmax prediction???????? Num Classes
More informationarxiv: v1 [cs.cv] 15 Aug 2018
Ensemble of Convolutional Neural Networks for Dermoscopic Images Classification Tomáš Majtner 1, Buda Bajić 2, Sule Yildirim 3, Jon Yngve Hardeberg 3, Joakim Lindblad 4,5 and Nataša Sladoje 4,5 arxiv:1808.05071v1
More informationAutomatic Detection of Knee Joints and Quantification of Knee Osteoarthritis Severity using Convolutional Neural Networks
Automatic Detection of Knee Joints and Quantification of Knee Osteoarthritis Severity using Convolutional Neural Networks Joseph Antony 1, Kevin McGuinness 1, Kieran Moran 1,2 and Noel E O Connor 1 Insight
More informationHealthcare Research You
DL for Healthcare Goals Healthcare Research You What are high impact problems in healthcare that deep learning can solve? What does research in AI applications to medical imaging look like? How can you
More informationARTIFICIAL INTELLIGENCE FOR DIGITAL PATHOLOGY. Kyunghyun Paeng, Co-founder and Research Scientist, Lunit Inc.
ARTIFICIAL INTELLIGENCE FOR DIGITAL PATHOLOGY Kyunghyun Paeng, Co-founder and Research Scientist, Lunit Inc. 1. BACKGROUND: DIGITAL PATHOLOGY 2. APPLICATIONS AGENDA BREAST CANCER PROSTATE CANCER 3. DEMONSTRATIONS
More informationGuided Open Vocabulary Image Captioning with Constrained Beam Search
Guided Open Vocabulary Image Captioning with Constrained Beam Search Peter Anderson 1, Basura Fernando 1, Mark Johnson 2, Stephen Gould 1 1 The Australian National University, Canberra, Australia firstname.lastname@anu.edu.au
More informationDifferential Attention for Visual Question Answering
Differential Attention for Visual Question Answering Badri Patro and Vinay P. Namboodiri IIT Kanpur { badri,vinaypn }@iitk.ac.in Abstract In this paper we aim to answer questions based on images when provided
More informationDeep Architectures for Neural Machine Translation
Deep Architectures for Neural Machine Translation Antonio Valerio Miceli Barone Jindřich Helcl Rico Sennrich Barry Haddow Alexandra Birch School of Informatics, University of Edinburgh Faculty of Mathematics
More informationWeak Supervision. Vincent Chen and Nish Khandwala
Weak Supervision Vincent Chen and Nish Khandwala Outline Motivation We want more labels! We want to program our data! #Software2.0 Weak Supervision Formulation Landscape of Noisy Labeling Schemes Snorkel
More informationGenerative Adversarial Networks.
Generative Adversarial Networks www.cs.wisc.edu/~page/cs760/ Goals for the lecture you should understand the following concepts Nash equilibrium Minimax game Generative adversarial network Prisoners Dilemma
More informationAn Overview and Comparative Analysis on Major Generative Models
An Overview and Comparative Analysis on Major Generative Models Zijing Gu zig021@ucsd.edu Abstract The amount of researches on generative models has been grown rapidly after a period of silence due to
More informationarxiv: v1 [cs.cv] 2 May 2017
Dense-Captioning Events in Videos Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles Stanford University {ranjaykrishna, kenjihata, fren, feifeili, jniebles}@cs.stanford.edu arxiv:1705.00754v1
More informationFEATURE EXTRACTION USING GAZE OF PARTICIPANTS FOR CLASSIFYING GENDER OF PEDESTRIANS IN IMAGES
FEATURE EXTRACTION USING GAZE OF PARTICIPANTS FOR CLASSIFYING GENDER OF PEDESTRIANS IN IMAGES Riku Matsumoto, Hiroki Yoshimura, Masashi Nishiyama, and Yoshio Iwai Department of Information and Electronics,
More informationClassification of breast cancer histology images using transfer learning
Classification of breast cancer histology images using transfer learning Sulaiman Vesal 1 ( ), Nishant Ravikumar 1, AmirAbbas Davari 1, Stephan Ellmann 2, Andreas Maier 1 1 Pattern Recognition Lab, Friedrich-Alexander-Universität
More informationAutomatic Context-Aware Image Captioning
Technical Disclosure Commons Defensive Publications Series May 23, 2017 Automatic Context-Aware Image Captioning Sandro Feuz Sebastian Millius Follow this and additional works at: http://www.tdcommons.org/dpubs_series
More informationGuided Open Vocabulary Image Captioning with Constrained Beam Search
Guided Open Vocabulary Image Captioning with Constrained Beam Search Peter Anderson 1, Basura Fernando 1, Mark Johnson 2, Stephen Gould 1 1 The Australian National University, Canberra, Australia firstname.lastname@anu.edu.au
More informationPatch-based Head and Neck Cancer Subtype Classification
Patch-based Head and Neck Cancer Subtype Classification Wanyi Qian, Guoli Yin, Frances Liu, Advisor: Olivier Gevaert, Mu Zhou, Kevin Brennan Stanford University wqian2@stanford.edu, guoliy@stanford.edu,
More informationPolicy Gradients. CS : Deep Reinforcement Learning Sergey Levine
Policy Gradients CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 1 due today (11:59 pm)! Don t be late! 2. Remember to start forming final project groups Today s Lecture 1.
More informationarxiv: v2 [cs.cv] 19 Jul 2017
Guided Open Vocabulary Image Captioning with Constrained Beam Search Peter Anderson 1, Basura Fernando 1, Mark Johnson 2, Stephen Gould 1 1 The Australian National University, Canberra, Australia firstname.lastname@anu.edu.au
More informationarxiv: v1 [cs.cv] 7 Mar 2018
Generating goal-directed visuomotor plans based on learning using a predictive coding type deep visuomotor recurrent neural network model Minkyu Choi 1, Takazumi Matsumoto 1, Minju Jung 1,2 and Jun Tani
More informationarxiv: v1 [cs.cv] 24 Jul 2018
Multi-Class Lesion Diagnosis with Pixel-wise Classification Network Manu Goyal 1, Jiahua Ng 2, and Moi Hoon Yap 1 1 Visual Computing Lab, Manchester Metropolitan University, M1 5GD, UK 2 University of
More informationPolicy Gradients. CS : Deep Reinforcement Learning Sergey Levine
Policy Gradients CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 1 milestone due today (11:59 pm)! Don t be late! 2. Remember to start forming final project groups Today s
More informationarxiv: v2 [cs.cv] 8 Mar 2018
Automated soft tissue lesion detection and segmentation in digital mammography using a u-net deep learning network Timothy de Moor a, Alejandro Rodriguez-Ruiz a, Albert Gubern Mérida a, Ritse Mann a, and
More informationACUTE LEUKEMIA CLASSIFICATION USING CONVOLUTION NEURAL NETWORK IN CLINICAL DECISION SUPPORT SYSTEM
ACUTE LEUKEMIA CLASSIFICATION USING CONVOLUTION NEURAL NETWORK IN CLINICAL DECISION SUPPORT SYSTEM Thanh.TTP 1, Giao N. Pham 1, Jin-Hyeok Park 1, Kwang-Seok Moon 2, Suk-Hwan Lee 3, and Ki-Ryong Kwon 1
More information