DEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS

Size: px
Start display at page:

Download "DEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS"

Transcription

1 SEOUL Oct.7, 2016 DEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS Gunhee Kim Computer Science and Engineering Seoul National University October 7, 2016

2 AGENDA Photo stream captioning Video captioning Cesc C. Park and Gunhee Kim. Expressing an Image Stream with a Sequence of Natural Sentences. NIPS

3 GENERAL USERS PHOTO STREAM Suppose that you and your family visit NYC A photo stream is a thread of the user s story Users do not organize the photo streams for later use Can we write a travelogue for a given photo stream? 3 3

4 PREVIOUS WORK IMAGE CAPTIONING Retrieve or generate a descriptive natural language sentence for a given image [Socher et al, TACL2013] [Karpathy et al. CVPR2015 ] [Mao et al, ICLR2015] Many more! [Vinyal et al, CVPR2015] [Gong et al. ECCV2014 ] 4 4

5 LIMITATION OF PREVIOUS WORK Much of previous work mainly discuss the relation between a single image and a single sentence A kid is smiling Absence of correlation, coherence and story for a stream of images Extend both input and output dimension to a sequence of images and a sequence of sentences 5 5

6 PROBLEM STATEMENT Objective: express an image stream to a coherent sequence of sentences A query of image stream A coherent sequence of sentences We took a couple days for family vacation in NYC to get away Empire state building right off the bat. Caeden is checking out the view. Caeden's first MLB game and my first in a while MLB game... He might be a mets fan Shake Shack

7 IMAGE-TEXT PARALLEL TRAINING DATA Use a set of blog posts to learn the relation between a image stream and a sequence of sentences 19K blog posts with 150K images from Blogs are written in a way of storytelling Blog pictures are selected as the most canonical ones out of photo albums Informative sentences associated with pictures about location, sentiments, actors, 7 7

8 OUR SOLUTION CRCN Coherence Recurrent Convolutional Network (1) Convolutional neural networks for image description (2) Bidirectional recurrent neural networks for language model (3) Coherence model for a smooth flow of multi-sentences 8 8

9 OVERVIEW OF ALGORITHM 9 9

10 CRCN ARCHITECTURE The compatibility score btw image and sentence sequence Misaligned Pairs should be few Aligned Pairs should be many 10 10

11 RETRIEVAL SENTENCE SEQUENCES Retrieve best sentence sequences for a query image stream Divide-conquer search strategy! Almost optimal! The local fluency and coherence is required for the global one 11 11

12 USER STUDIES VIA AMT Goal: Find general users preferences between text sequences by different methods for a given photo stream Randomly select 100 test streams of 5 images Our method and one baseline predict text sequences Pairwise preference test via Quantitative results A higher number than 50% validate our approach (CRCN) The coherence becomes more critical as the passage is longer (4th vs 5th columns) 12 12

13 RESULTS FOR NYC DATASET (1) (2) (3) (4) (5) (CRCN) (1) One of the hallway arches inside of the library (2) As we walked through the library I noticed an exhibit called lunch hour nyc it captured my attention as I had also taken a tour of nyc food carts during my trip (3) Here is the top of the Chrysler building everyone's favorite skyscraper in new york. (4) After leaving the nypl we walked along 42nd st. (5) We walked down fifth avenue from rockefeller centre checking out the windows in saks the designer stores and eventually making our way to the impressive new york public library. (RCN) (1) As you walk along in some spots it looks like the buildings are sprouting up out of the high line plants (2) Charlie and his aunt donna relax on the high line after a steamy stroll (3) However navigating the new york subway system can be like trying to find your way through the amazon jungle sans guide (4) We loved nyc! (5) Getting ready for the sunny day...putting sunscreen on

14 AGENDA Photo stream captioning Video captioning 14

15 WON LSMDC 2016! Large Scale Movie Description and Understanding Challenge (LSMDC 2016) in ECCV 2016 and MM Team members: Youngjae Yu, Hyungjin Ko, Jongwook Choi, Gunhee Kim movie description movie multiple-choice movie fill-in-the-blank movie retrieval His vanity license plate reads

16 ATTENTION MECHANISMS IN DEEP LEARNING Machine decides where to attend itself --- sequentially focus or attend on the most relevant part of the input over time. Image captioning [Xu et al. ICML2015] Machine translation [Hermann et al. NIPS2015] K. Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML K. M. Hermann et al. Teaching Machines to Read and Comprehend. NIPS

17 ATTENTION MECHANISMS IN DEEP LEARNING Machine decides where to attend itself --- sequentially focus or attend on the most relevant part of the input over time. Action recognition [Sharma et al. Arxiv2015] Golf swinging Trampoline jumping Video captioning [Sharma. Toronto 2016] Generated sentence : A woman is slicing a onion. Groundtruth : A woman is slicing a shrimp. S. Sharma et al. Action Recognition Using Visual Attention. Arxiv U Toronto, S. Sharma. Action recognition and video description using visual attention. MS thesis, 17

18 PROBLEM STATEMENT Although attention models simulate human s attrition, there is no attempt to explicitly use human gaze labels Attention weights are implicitly learned in an end-to-end manner Does human attention improve the model performance? Then how can we inject such supervision to the attention model? Target task: Video captioning 18 18

19 PROBLEM STATEMENT Objective: supervise a caption generation model to attend where human focus on A short movie video stream Movie clip Predicted Human attention A human gaze assisted caption A little boy flies in the air by riding a bicycle

20 RESULTS (CAPTION GENERATION) [1] Subhashini Venugopalan et al. Sequence to Sequence Video to Text. ICCV 2015 [2] Li Yao et al. Describing Videos by Exploiting Temporal Structure. ICCV

21 RESULTS (CAPTION GENERATION) [1] Subhashini Venugopalan et al. Sequence to Sequence Video to Text. ICCV 2015 [2] Li Yao et al. Describing Videos by Exploiting Temporal Structure. ICCV

22 CONCLUSION Joint understanding of multiple data modalities Visual data (images/videos) + Textual data Deep learning models are superior to jointly represent multiple data or tasks In the areas of robotics, VR, security, speech analysis, and more Many possible applications for online services 22 22

23 SEOUL Oct.7, 2016 THANK YOU

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience. Outline: Motivation. What s the attention mechanism? Soft attention vs. Hard attention. Attention in Machine translation. Attention in Image captioning. State-of-the-art. 1 Motivation: Attention: Focusing

More information

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

Translating Videos to Natural Language Using Deep Recurrent Neural Networks Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan UT Austin Huijuan Xu UMass. Lowell Jeff Donahue UC Berkeley Marcus Rohrbach UC Berkeley Subhashini Venugopalan

More information

Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta

Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta Image Captioning using Reinforcement Learning Presentation by: Samarth Gupta 1 Introduction Summary Supervised Models Image captioning as RL problem Actor Critic Architecture Policy Gradient architecture

More information

Sequential Predictions Recurrent Neural Networks

Sequential Predictions Recurrent Neural Networks CS 2770: Computer Vision Sequential Predictions Recurrent Neural Networks Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 One Motivation: Descriptive Text for Images It was an arresting

More information

Recurrent Neural Networks

Recurrent Neural Networks CS 2750: Machine Learning Recurrent Neural Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2017 One Motivation: Descriptive Text for Images It was an arresting face, pointed of chin,

More information

The University of Tokyo, NVAIL Partner Yoshitaka Ushiku

The University of Tokyo, NVAIL Partner Yoshitaka Ushiku Recognize, Describe, and Generate: Introduction of Recent Work at MIL The University of Tokyo, NVAIL Partner Yoshitaka Ushiku MIL: Machine Intelligence Laboratory Beyond Human Intelligence Based on Cyber-Physical

More information

Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem

Action Recognition. Computer Vision Jia-Bin Huang, Virginia Tech. Many slides from D. Hoiem Action Recognition Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem This section: advanced topics Convolutional neural networks in vision Action recognition Vision and Language 3D

More information

Audiovisual to Sign Language Translator

Audiovisual to Sign Language Translator Technical Disclosure Commons Defensive Publications Series July 17, 2018 Audiovisual to Sign Language Translator Manikandan Gopalakrishnan Follow this and additional works at: https://www.tdcommons.org/dpubs_series

More information

Vision, Language, Reasoning

Vision, Language, Reasoning CS 2770: Computer Vision Vision, Language, Reasoning Prof. Adriana Kovashka University of Pittsburgh March 5, 2019 Plan for this lecture Image captioning Tool: Recurrent neural networks Captioning for

More information

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012 Practical Bayesian Optimization of Machine Learning Algorithms Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012 ... (Gaussian Processes) are inadequate for doing speech and vision. I still think they're

More information

Interact-AS. Use handwriting, typing and/or speech input. The most recently spoken phrase is shown in the top box

Interact-AS. Use handwriting, typing and/or speech input. The most recently spoken phrase is shown in the top box Interact-AS One of the Many Communications Products from Auditory Sciences Use handwriting, typing and/or speech input The most recently spoken phrase is shown in the top box Use the Control Box to Turn

More information

CSC2541 Project Paper: Mood-based Image to Music Synthesis

CSC2541 Project Paper: Mood-based Image to Music Synthesis CSC2541 Project Paper: Mood-based Image to Music Synthesis Mary Elaine Malit Department of Computer Science University of Toronto elainemalit@cs.toronto.edu Jun Shu Song Department of Computer Science

More information

B657: Final Project Report Holistically-Nested Edge Detection

B657: Final Project Report Holistically-Nested Edge Detection B657: Final roject Report Holistically-Nested Edge Detection Mingze Xu & Hanfei Mei May 4, 2016 Abstract Holistically-Nested Edge Detection (HED), which is a novel edge detection method based on fully

More information

DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation Biyi Fang Michigan State University ACM SenSys 17 Nov 6 th, 2017 Biyi Fang (MSU) Jillian Co (MSU) Mi Zhang

More information

arxiv: v1 [stat.ml] 23 Jan 2017

arxiv: v1 [stat.ml] 23 Jan 2017 Learning what to look in chest X-rays with a recurrent visual attention model arxiv:1701.06452v1 [stat.ml] 23 Jan 2017 Petros-Pavlos Ypsilantis Department of Biomedical Engineering King s College London

More information

Automatic Context-Aware Image Captioning

Automatic Context-Aware Image Captioning Technical Disclosure Commons Defensive Publications Series May 23, 2017 Automatic Context-Aware Image Captioning Sandro Feuz Sebastian Millius Follow this and additional works at: http://www.tdcommons.org/dpubs_series

More information

Name Period Date. Grade 7, Unit 4 Pre-assessment

Name Period Date. Grade 7, Unit 4 Pre-assessment Name Period Date Grade 7, Unit 4 Pre-assessment A Letter from New York by Jean Lawler Dear Aunt Julia, I have so much to tell you, I don t know where to begin! Remember last summer when I told you that

More information

Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)

Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH) Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH) Matt Huenerfauth Raja Kushalnagar Rochester Institute of Technology DHH Auditory Issues Links Accents/Intonation Listening

More information

Putting Context into. Vision. September 15, Derek Hoiem

Putting Context into. Vision. September 15, Derek Hoiem Putting Context into Vision Derek Hoiem September 15, 2004 Questions to Answer What is context? How is context used in human vision? How is context currently used in computer vision? Conclusions Context

More information

Efficient Deep Model Selection

Efficient Deep Model Selection Efficient Deep Model Selection Jose Alvarez Researcher Data61, CSIRO, Australia GTC, May 9 th 2017 www.josemalvarez.net conv1 conv2 conv3 conv4 conv5 conv6 conv7 conv8 softmax prediction???????? Num Classes

More information

Shu Kong. Department of Computer Science, UC Irvine

Shu Kong. Department of Computer Science, UC Irvine Ubiquitous Fine-Grained Computer Vision Shu Kong Department of Computer Science, UC Irvine Outline 1. Problem definition 2. Instantiation 3. Challenge and philosophy 4. Fine-grained classification with

More information

NE LESSON GD Fit Families: Effortless Exercise

NE LESSON GD Fit Families: Effortless Exercise NE LESSON GD-000-06 Fit Families: Effortless Exercise LESSON DESCRIPTION In this video and activity lesson, class participants will learn strategies for incorporating physical activity into their daily

More information

RESOURCE CATALOGUE. I Can Sign A Rainbow

RESOURCE CATALOGUE. I Can Sign A Rainbow RESOURCE CATALOGUE I Can Sign A Rainbow Catalogue # 100 I Can Sign a Rainbow is a 2 disc set (DVD and CD-ROM) designed for pre-schoolers and their families who use New Zealand Sign Language or spoken English.

More information

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling

Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling AAAI -13 July 16, 2013 Video Saliency Detection via Dynamic Consistent Spatio- Temporal Attention Modelling Sheng-hua ZHONG 1, Yan LIU 1, Feifei REN 1,2, Jinghuan ZHANG 2, Tongwei REN 3 1 Department of

More information

Gesture Recognition using Marathi/Hindi Alphabet

Gesture Recognition using Marathi/Hindi Alphabet Gesture Recognition using Marathi/Hindi Alphabet Rahul Dobale ¹, Rakshit Fulzele², Shruti Girolla 3, Seoutaj Singh 4 Student, Computer Engineering, D.Y. Patil School of Engineering, Pune, India 1 Student,

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 12: Time Sequence Data, Recurrent Neural Networks (RNNs), Long Short-Term Memories (s), and Image Captioning Peter Belhumeur Computer Science Columbia University

More information

CAPL 2 Questionnaire

CAPL 2 Questionnaire CAPL 2 Questionnaire What Do You Think About Physical Activity? When we ask you about physical activity, we mean when you are moving around, playing, or exercising. Physical activity is any activity that

More information

GED Preparation Lesson Plan. Module: Science. Lesson Title: Forming a Conclusion. Standards: GED Preparation (Adult General Education)

GED Preparation Lesson Plan. Module: Science. Lesson Title: Forming a Conclusion. Standards: GED Preparation (Adult General Education) GED Preparation Lesson Plan Module: Science Lesson Title: Forming a Conclusion Standards: GED Preparation (Adult General Education) Scientific Practices 2014 Assessment Targets Understand and explain textual

More information

Lesson 1 Pre-Visit The Athlete's Body

Lesson 1 Pre-Visit The Athlete's Body Lesson 1 Pre-Visit The Athlete's Body Objective: Students will be able to: Explain the meaning of fitness. Identify ways that athletes maintain physical fitness. Correctly identify different muscle groups

More information

Hierarchical Convolutional Features for Visual Tracking

Hierarchical Convolutional Features for Visual Tracking Hierarchical Convolutional Features for Visual Tracking Chao Ma Jia-Bin Huang Xiaokang Yang Ming-Husan Yang SJTU UIUC SJTU UC Merced ICCV 2015 Background Given the initial state (position and scale), estimate

More information

arxiv: v1 [cs.cv] 12 Dec 2016

arxiv: v1 [cs.cv] 12 Dec 2016 Text-guided Attention Model for Image Captioning Jonghwan Mun, Minsu Cho, Bohyung Han Department of Computer Science and Engineering, POSTECH, Korea {choco1916, mscho, bhhan}@postech.ac.kr arxiv:1612.03557v1

More information

Domain Generalization and Adaptation using Low Rank Exemplar Classifiers

Domain Generalization and Adaptation using Low Rank Exemplar Classifiers Domain Generalization and Adaptation using Low Rank Exemplar Classifiers 报告人 : 李文 苏黎世联邦理工学院计算机视觉实验室 李文 苏黎世联邦理工学院 9/20/2017 1 Outline Problems Domain Adaptation and Domain Generalization Low Rank Exemplar

More information

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Ryo Izawa, Naoki Motohashi, and Tomohiro Takagi Department of Computer Science Meiji University 1-1-1 Higashimita,

More information

Open-Domain Chatting Machine: Emotion and Personality

Open-Domain Chatting Machine: Emotion and Personality Open-Domain Chatting Machine: Emotion and Personality Minlie Huang, Associate Professor Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn http://aihuang.org/p 2017/12/15 1 Open-domain

More information

Source and Description Category of Practice Level of CI User How to Use Additional Information. Intermediate- Advanced. Beginner- Advanced

Source and Description Category of Practice Level of CI User How to Use Additional Information. Intermediate- Advanced. Beginner- Advanced Source and Description Category of Practice Level of CI User How to Use Additional Information Randall s ESL Lab: http://www.esllab.com/ Provide practice in listening and comprehending dialogue. Comprehension

More information

CS/NEUR125 Brains, Minds, and Machines. Due: Friday, April 14

CS/NEUR125 Brains, Minds, and Machines. Due: Friday, April 14 CS/NEUR125 Brains, Minds, and Machines Assignment 5: Neural mechanisms of object-based attention Due: Friday, April 14 This Assignment is a guided reading of the 2014 paper, Neural Mechanisms of Object-Based

More information

Sign Language MT. Sara Morrissey

Sign Language MT. Sara Morrissey Sign Language MT Sara Morrissey Introduction Overview Irish Sign Language Problems for SLMT SLMT Data MaTrEx for SLs Future work Introduction (1) Motivation SLs are poorly resourced and lack political,

More information

Vector Learning for Cross Domain Representations

Vector Learning for Cross Domain Representations Vector Learning for Cross Domain Representations Shagan Sah, Chi Zhang, Thang Nguyen, Dheeraj Kumar Peri, Ameya Shringi, Raymond Ptucha Rochester Institute of Technology, Rochester, NY 14623, USA arxiv:1809.10312v1

More information

Shu Kong. Department of Computer Science, UC Irvine

Shu Kong. Department of Computer Science, UC Irvine Ubiquitous Fine-Grained Computer Vision Shu Kong Department of Computer Science, UC Irvine Outline 1. Problem definition 2. Instantiation 3. Challenge 4. Fine-grained classification with holistic representation

More information

Rising Scholars Academy 8 th Grade English I Summer Reading Project The Alchemist By Paulo Coelho

Rising Scholars Academy 8 th Grade English I Summer Reading Project The Alchemist By Paulo Coelho Rising Scholars Academy 8 th Grade English I Summer Reading Project The Alchemist By Paulo Coelho Welcome to 8th grade English I! Summer is a time where you can relax and have fun, but did you know you

More information

Autism, my sibling, and me

Autism, my sibling, and me ORGANIZATION FOR AUTISM RESEARCH Autism, my sibling, and me Brothers and sisters come in all shapes and sizes. They have a lot in common, and they can be really different from each other. Some kids even

More information

Group Behavior Analysis and Its Applications

Group Behavior Analysis and Its Applications Group Behavior Analysis and Its Applications CVPR 2015 Tutorial Lecturers: Hyun Soo Park (University of Pennsylvania) Wongun Choi (NEC America Laboratory) Schedule 08:30am-08:50am 08:50am-09:50am 09:50am-10:10am

More information

Appendix C. Sample prepirls Passage, Questions, and Scoring Guides. Reading for Literary Experience Charlie s Talent

Appendix C. Sample prepirls Passage, Questions, and Scoring Guides. Reading for Literary Experience Charlie s Talent Appendix C Sample prepirls Passage, Questions, and Scoring Guides Reading for Literary Experience Charlie s Talent Charlie s Talent Summer had just begun. Charlie was planting a garden in front of his

More information

Keyword-driven Image Captioning via Context-dependent Bilateral LSTM

Keyword-driven Image Captioning via Context-dependent Bilateral LSTM Keyword-driven Image Captioning via Context-dependent Bilateral LSTM Xiaodan Zhang 1,2, Shuqiang Jiang 2, Qixiang Ye 2, Jianbin Jiao 2, Rynson W.H. Lau 1 1 City University of Hong Kong 2 University of

More information

Medical Image Analysis

Medical Image Analysis Medical Image Analysis 1 Co-trained convolutional neural networks for automated detection of prostate cancer in multiparametric MRI, 2017, Medical Image Analysis 2 Graph-based prostate extraction in t2-weighted

More information

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals.

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals. An assistive application identifying emotional state and executing a methodical healing process for depressive individuals. Bandara G.M.M.B.O bhanukab@gmail.com Godawita B.M.D.T tharu9363@gmail.com Gunathilaka

More information

Learning to Disambiguate by Asking Discriminative Questions Supplementary Material

Learning to Disambiguate by Asking Discriminative Questions Supplementary Material Learning to Disambiguate by Asking Discriminative Questions Supplementary Material Yining Li 1 Chen Huang 2 Xiaoou Tang 1 Chen Change Loy 1 1 Department of Information Engineering, The Chinese University

More information

Level 5-6 What Katy Did

Level 5-6 What Katy Did Level 5-6 What Katy Did Workbook Teacher s Guide and Answer Key A. Summary 1. Book Summary Teacher s Guide Once there was a girl named Katy. She tried to be good, but she always ended up doing the wrong

More information

5/2018. AAC Lending Library. Step-by-Step with Levels. ablenet. Attainment Company. Go Talk One Talking Block. ablenet. Attainment.

5/2018. AAC Lending Library. Step-by-Step with Levels. ablenet. Attainment Company. Go Talk One Talking Block. ablenet. Attainment. 5/2018 Device Step-by-Step with Levels Go Talk One Talking Block Go Talk 4, 9, 20 AAC Lending Library User Profile Description Manufacturer EmergentSupplementary Designed for pre-recording a series of

More information

High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks

High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks Krzysztof J. Geras Joint work with Kyunghyun Cho, Linda Moy, Gene Kim, Stacey Wolfson and Artie Shen. GTC 2017

More information

A HMM-based Pre-training Approach for Sequential Data

A HMM-based Pre-training Approach for Sequential Data A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University

More information

LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES. Felix Sun, David Harwath, and James Glass

LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES. Felix Sun, David Harwath, and James Glass LOOK, LISTEN, AND DECODE: MULTIMODAL SPEECH RECOGNITION WITH IMAGES Felix Sun, David Harwath, and James Glass MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA {felixsun,

More information

Repurposing Corpus Materials For Interpreter Education. Nancy Frishberg, Ph.D. MSB Associates San Mateo, California USA

Repurposing Corpus Materials For Interpreter Education. Nancy Frishberg, Ph.D. MSB Associates San Mateo, California USA Repurposing Corpus Materials For Interpreter Education Nancy Frishberg, Ph.D. MSB Associates San Mateo, California USA nancyf@fishbird.com I was raised in a household with lots of media 78 rpm Phonograph

More information

Visualizing the Affective Structure of a Text Document

Visualizing the Affective Structure of a Text Document Visualizing the Affective Structure of a Text Document Hugo Liu, Ted Selker, Henry Lieberman MIT Media Laboratory {hugo, selker, lieber} @ media.mit.edu http://web.media.mit.edu/~hugo Overview Motivation

More information

arxiv: v1 [cs.cv] 7 Dec 2018

arxiv: v1 [cs.cv] 7 Dec 2018 An Attempt towards Interpretable Audio-Visual Video Captioning Yapeng Tian 1, Chenxiao Guan 1, Justin Goodman 2, Marc Moore 3, and Chenliang Xu 1 arxiv:1812.02872v1 [cs.cv] 7 Dec 2018 1 Department of Computer

More information

DeepDiary: Automatically Captioning Lifelogging Image Streams

DeepDiary: Automatically Captioning Lifelogging Image Streams DeepDiary: Automatically Captioning Lifelogging Image Streams Chenyou Fan David J. Crandall School of Informatics and Computing Indiana University Bloomington, Indiana USA {fan6,djcran}@indiana.edu Abstract.

More information

Convolutional Neural Networks for Text Classification

Convolutional Neural Networks for Text Classification Convolutional Neural Networks for Text Classification Sebastian Sierra MindLab Research Group July 1, 2016 ebastian Sierra (MindLab Research Group) NLP Summer Class July 1, 2016 1 / 32 Outline 1 What is

More information

Blast Searcher Formative Evaluation. March 02, Adam Klinger and Josh Gutwill

Blast Searcher Formative Evaluation. March 02, Adam Klinger and Josh Gutwill Blast Searcher Formative Evaluation March 02, 2006 Keywords: Genentech Biology Life Sciences Adam Klinger and Josh Gutwill Traits of Life \ - 1 - BLAST Searcher Formative Evaluation March 2, 2006 Adam

More information

Intelligent Machines That Act Rationally. Hang Li Bytedance AI Lab

Intelligent Machines That Act Rationally. Hang Li Bytedance AI Lab Intelligent Machines That Act Rationally Hang Li Bytedance AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly

More information

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks 1 Rumor Detection on Twitter with Tree-structured Recursive Neural Networks Jing Ma 1, Wei Gao 2, Kam-Fai Wong 1,3 1 The Chinese University of Hong Kong 2 Victoria University of Wellington, New Zealand

More information

Emotion Recognition using a Cauchy Naive Bayes Classifier

Emotion Recognition using a Cauchy Naive Bayes Classifier Emotion Recognition using a Cauchy Naive Bayes Classifier Abstract Recognizing human facial expression and emotion by computer is an interesting and challenging problem. In this paper we propose a method

More information

Why did the network make this prediction?

Why did the network make this prediction? Why did the network make this prediction? Ankur Taly (Google Inc.) Joint work with Mukund Sundararajan and Qiqi Yan Some Deep Learning Successes Source: https://www.cbsinsights.com Deep Neural Networks

More information

Building Evaluation Scales for NLP using Item Response Theory

Building Evaluation Scales for NLP using Item Response Theory Building Evaluation Scales for NLP using Item Response Theory John Lalor CICS, UMass Amherst Joint work with Hao Wu (BC) and Hong Yu (UMMS) Motivation Evaluation metrics for NLP have been mostly unchanged

More information

A Year of Tips for Communication Success

A Year of Tips for Communication Success A Year of Tips for Communication Success 2143 Township Road 112 Millersburg, OH 44654 1-877-397-0178 info@saltillo.com www.saltillo.com Create Communication Success: Teach communication through modeling

More information

MyDispense OTC exercise Guide

MyDispense OTC exercise Guide MyDispense OTC exercise Guide Version 5.0 Page 1 of 23 Page 2 of 23 Table of Contents What is MyDispense?... 4 Who is this guide for?... 4 How should I use this guide?... 4 OTC exercises explained... 4

More information

Houghton Mifflin Harcourt Discovering French Today! Level correlated to the

Houghton Mifflin Harcourt Discovering French Today! Level correlated to the Houghton Mifflin Harcourt Discovering French Today! Level 2 2013 correlated to the NCSSFL ACTFL Can-Do Statements (2015), Novice Mid, Novice High, Intermediate Low Novice Mid Interpersonal Communication

More information

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1 1. INTRODUCTION Sign language interpretation is one of the HCI applications where hand gesture plays important role for communication. This chapter discusses sign language interpretation system with present

More information

The Mythos of Model Interpretability

The Mythos of Model Interpretability The Mythos of Model Interpretability Zachary C. Lipton https://arxiv.org/abs/1606.03490 Outline What is interpretability? What are its desiderata? What model properties confer interpretability? Caveats,

More information

Convolutional and LSTM Neural Networks

Convolutional and LSTM Neural Networks Convolutional and LSTM Neural Networks Vanessa Jurtz January 12, 2016 Contents Neural networks and GPUs Lasagne Peptide binding to MHC class II molecules Convolutional Neural Networks (CNN) Recurrent and

More information

Lesson 14: Association Between Categorical Variables

Lesson 14: Association Between Categorical Variables Lesson 14: Association Between Categorical Variables Classwork Example 1 Suppose a random group of people are surveyed about their use of smartphones. The results of the survey are summarized in the tables

More information

Intelligent Machines That Act Rationally. Hang Li Toutiao AI Lab

Intelligent Machines That Act Rationally. Hang Li Toutiao AI Lab Intelligent Machines That Act Rationally Hang Li Toutiao AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly Thinking

More information

Level 14 Book f. Level 14 Word Count 321 Text Type Information report High Frequency children, father, Word/s Introduced people

Level 14 Book f. Level 14 Word Count 321 Text Type Information report High Frequency children, father, Word/s Introduced people People and Dolphins Level 14 Book f The Apple Tree Macey s Mess This Little Girl Roger the Bat Goes to Town Written by Sarah Edwards Ringo Goes Missing Level 14 Word Count 321 Text Type Information report

More information

Improving the Interpretability of DEMUD on Image Data Sets

Improving the Interpretability of DEMUD on Image Data Sets Improving the Interpretability of DEMUD on Image Data Sets Jake Lee, Jet Propulsion Laboratory, California Institute of Technology & Columbia University, CS 19 Intern under Kiri Wagstaff Summer 2018 Government

More information

Helping Families Achieve the Best Outcome for their Child with a Cochlear Implant

Helping Families Achieve the Best Outcome for their Child with a Cochlear Implant Helping Families Achieve the Best Outcome for their Child with a Cochlear Implant Donna L. Sorkin, M.A., Vice President, Consumer Affairs Melissa Wilson, Au.D., CCC-AAA, Clinical Applications Specialist

More information

Parents Talk About Teaching Kids to Read

Parents Talk About Teaching Kids to Read Parents Talk About Teaching Kids to Read [This page prints out on 5 sheets] Learn what other parents say about teaching their kids to read. Click on the links below to go to the quote. After you read the

More information

Global&Journal&of&Community&Psychology&Practice&

Global&Journal&of&Community&Psychology&Practice& Global&Journal&of&Community&Psychology&Practice& Volume 2, Issue 2 December 2011 Tearless Logic Model Ashlee D. Lien, Justin P. Greenleaf, Michael K. Lemke, Sharon M. Hakim, Nathan P. Swink, Rosemary Wright,

More information

CS343: Artificial Intelligence

CS343: Artificial Intelligence CS343: Artificial Intelligence Introduction: Part 2 Prof. Scott Niekum University of Texas at Austin [Based on slides created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All materials

More information

Making Sure People with Communication Disabilities Get the Message

Making Sure People with Communication Disabilities Get the Message Emergency Planning and Response for People with Disabilities Making Sure People with Communication Disabilities Get the Message A Checklist for Emergency Public Information Officers This document is part

More information

Object Detectors Emerge in Deep Scene CNNs

Object Detectors Emerge in Deep Scene CNNs Object Detectors Emerge in Deep Scene CNNs Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba Presented By: Collin McCarthy Goal: Understand how objects are represented in CNNs Are

More information

Attentional Masking for Pre-trained Deep Networks

Attentional Masking for Pre-trained Deep Networks Attentional Masking for Pre-trained Deep Networks IROS 2017 Marcus Wallenberg and Per-Erik Forssén Computer Vision Laboratory Department of Electrical Engineering Linköping University 2014 2017 Per-Erik

More information

Human Information Processing and Cultural Diversity. Jaana Holvikivi, DSc. School of ICT

Human Information Processing and Cultural Diversity. Jaana Holvikivi, DSc. School of ICT Human Information Processing and Cultural Diversity Jaana Holvikivi, DSc. School of ICT Outline Introduction Understanding the human mind Cognitive capacity Cultural schemas Jaana Holvikivi 2 The human

More information

- Where... you last night? - What time? - At about... - I was... Why? - Because there was a at. - There was? Too bad I wasn t there.

- Where... you last night? - What time? - At about... - I was... Why? - Because there was a at. - There was? Too bad I wasn t there. Do You Know..? Activity 1 1. Where was Einstein when he discovered the theory of relativity? 2. Where were Neil Armstrong and Edwin Aldrin on July 10th, 1969? 3. Where was J.F.K. when he was assassinated?

More information

Active Deformable Part Models Inference

Active Deformable Part Models Inference Active Deformable Part Models Inference Menglong Zhu Nikolay Atanasov George J. Pappas Kostas Daniilidis GRASP Laboratory, University of Pennsylvania 3330 Walnut Street, Philadelphia, PA 19104, USA Abstract.

More information

3. Which word is an antonym

3. Which word is an antonym Name: Date: 1 Read the text and then answer the questions. Stephanie s best friend, Lindsey, was having a birthday in a few weeks. The problem was that Stephanie had no idea what to get her. She didn t

More information

Interpreting Deep Neural Networks and their Predictions

Interpreting Deep Neural Networks and their Predictions Fraunhofer Image Processing Heinrich Hertz Institute Interpreting Deep Neural Networks and their Predictions Wojciech Samek ML Group, Fraunhofer HHI (joint work with S. Lapuschkin, A. Binder, G. Montavon,

More information

COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks

COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks COMP9444 Neural Networks and Deep Learning 5. Convolutional Networks Textbook, Sections 6.2.2, 6.3, 7.9, 7.11-7.13, 9.1-9.5 COMP9444 17s2 Convolutional Networks 1 Outline Geometry of Hidden Unit Activations

More information

Elementary 2 Unit Go to online version of the activity. Go back to this menu.

Elementary 2 Unit Go to online version of the activity. Go back to this menu. Elementary 2 Unit 4 1 2 3 4 5 6 7 8 Go to online version of the activity. Go back to this menu. Do You Know..? Activity 1 1. Where was Einstein when he discovered the theory of relativity? 2. Where were

More information

September 2016 Newsletter

September 2016 Newsletter September 2016 Newsletter Your Family Name Shuffling your kids back to school these days takes more than a new wardrobe and a shiny apple. What about the dizzying array of immunizations? Hearing and vision

More information

Come for the day, or choose to stay!

Come for the day, or choose to stay! Volunteer Program 2018 Thank you for your interest in the Kids Saving the Rainforest volunteer program! We are a wildlife sanctuary, rescue and rehabilitation center located in beautiful Quepos, Costa

More information

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian The 29th Fuzzy System Symposium (Osaka, September 9-, 3) A Fuzzy Inference Method Based on Saliency Map for Prediction Mao Wang, Yoichiro Maeda 2, Yasutake Takahashi Graduate School of Engineering, University

More information

Real Time Sign Language Processing System

Real Time Sign Language Processing System Real Time Sign Language Processing System Dibyabiva Seth (&), Anindita Ghosh, Ariruna Dasgupta, and Asoke Nath Department of Computer Science, St. Xavier s College (Autonomous), Kolkata, India meetdseth@gmail.com,

More information

Character Motivation Essential Question: How do readers analyze character motivation, including how it advances the plot and theme in a story?

Character Motivation Essential Question: How do readers analyze character motivation, including how it advances the plot and theme in a story? Character Motivation Essential Question: How do readers analyze character motivation, including how it advances the plot and theme in a story? WORDS IN BLUE ARE NOTES FOR YOU! Movie Time!!! https://www.youtube.com/watch?v=y6sbbm5quo0

More information

Appendix I Teaching outcomes of the degree programme (art. 1.3)

Appendix I Teaching outcomes of the degree programme (art. 1.3) Appendix I Teaching outcomes of the degree programme (art. 1.3) The Master graduate in Computing Science is fully acquainted with the basic terms and techniques used in Computing Science, and is familiar

More information

arxiv: v2 [cs.cv] 3 Apr 2018

arxiv: v2 [cs.cv] 3 Apr 2018 Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning Jingwen Wang Wenhao Jiang Lin Ma Wei Liu Yong Xu South China University of Technology Tencent AI Lab {jaywongjaywong, cswhjiang,

More information

GIANT: Geo-Informative Attributes for Location Recognition and Exploration

GIANT: Geo-Informative Attributes for Location Recognition and Exploration GIANT: Geo-Informative Attributes for Location Recognition and Exploration Quan Fang, Jitao Sang, Changsheng Xu Institute of Automation, Chinese Academy of Sciences October 23, 2013 Where is this? La Sagrada

More information

Social Skills and Autism: Using Books in Creative Ways to Reach and Teach in Early Education

Social Skills and Autism: Using Books in Creative Ways to Reach and Teach in Early Education Social Skills and Autism: Using Books in Creative Ways to Reach and Teach in Early Education Mary Jane Weiss, Ph.D., and Cheri J. Meiners, M.Ed. State of intervention.. Social deficits remain the most

More information

Streamlined Dense Video Captioning

Streamlined Dense Video Captioning 1 3 Topic: caesar salad recipe : a caesar salad is ready and is served in a bowl ee 2 : croutons are in a bowl and chopped ingredients are separated ee 3 : the man mix all the ingredients in a bowl to

More information

Multi-attention Guided Activation Propagation in CNNs

Multi-attention Guided Activation Propagation in CNNs Multi-attention Guided Activation Propagation in CNNs Xiangteng He and Yuxin Peng (B) Institute of Computer Science and Technology, Peking University, Beijing, China pengyuxin@pku.edu.cn Abstract. CNNs

More information

Increasing Social Awareness in Adults with Autism Spectrum Disorders

Increasing Social Awareness in Adults with Autism Spectrum Disorders Increasing Social Awareness in Adults with Autism Spectrum Disorders Heather Conroy, LCSW Western Region ASERT University of Pittsburgh Medical Center UPMC Center for Autism and Developmental Disorders

More information

Using Deep Convolutional Networks for Gesture Recognition in American Sign Language

Using Deep Convolutional Networks for Gesture Recognition in American Sign Language Using Deep Convolutional Networks for Gesture Recognition in American Sign Language Abstract In the realm of multimodal communication, sign language is, and continues to be, one of the most understudied

More information