DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

Similar documents
Experimental evaluation of the accuracy of the second generation of Microsoft Kinect system, for using in stroke rehabilitation applications

Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)

Sign Language Recognition with the Kinect Sensor Based on Conditional Random Fields

Inferring Clinical Correlations from EEG Reports with Deep Neural Learning

Real Time Sign Language Processing System

Sign Language Recognition using Kinect

Academic Program / Discipline Area (for General Education) or Co-Curricular Program Area:

TURKISH SIGN LANGUAGE RECOGNITION USING HIDDEN MARKOV MODEL

Expression Recognition. Mohammad Amanzadeh

TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING

Image Captioning using Reinforcement Learning. Presentation by: Samarth Gupta

Using Deep Convolutional Networks for Gesture Recognition in American Sign Language

arxiv: v1 [cs.cv] 13 Mar 2018

An Approach to Hand Gesture Recognition for Devanagari Sign Language using Image Processing Tool Box

Audiovisual to Sign Language Translator

CHINESE SIGN LANGUAGE RECOGNITION WITH ADAPTIVE HMM. Jihai Zhang, Wengang Zhou, Chao Xie, Junfu Pu, and Houqiang Li

Gesture Recognition using Marathi/Hindi Alphabet

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Analysis of Recognition System of Japanese Sign Language using 3D Image Sensor

LSA64: An Argentinian Sign Language Dataset

Hand Sign to Bangla Speech: A Deep Learning in Vision based system for Recognizing Hand Sign Digits and Generating Bangla Speech

Building an Application for Learning the Finger Alphabet of Swiss German Sign Language through Use of the Kinect

ScienceDirect. Sign Language Unification: The Need for Next Generation Deaf Education

Sign Language in the Intelligent Sensory Environment

Sign Language to English (Slate8)

Recognition of sign language gestures using neural networks

Performance Analysis of different Classifiers for Chinese Sign Language Recognition

Efficient Deep Model Selection

Motivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.

Multi-Modality American Sign Language Recognition

HAND GESTURE RECOGNITION USING ADAPTIVE NETWORK BASED FUZZY INFERENCE SYSTEM AND K-NEAREST NEIGHBOR. Fifin Ayu Mufarroha 1, Fitri Utaminingrum 1*

Skin cancer reorganization and classification with deep neural network

Implementation of image processing approach to translation of ASL finger-spelling to digital text

N RISCE 2K18 ISSN International Journal of Advance Research and Innovation

Developing a Game-Based Proprioception Reconstruction System for Patients with Ankle Sprain

enterface 13 Kinect-Sign João Manuel Ferreira Gameiro Project Proposal for enterface 13

Scalable ASL sign recognition using model-based machine learning and linguistically annotated corpora

Sign Language Gesture Classification Using Neural Networks

A HMM-based Pre-training Approach for Sequential Data

The Leap Motion controller: A view on sign language

An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

arxiv: v1 [stat.ml] 23 Jan 2017

3. MANUAL ALPHABET RECOGNITION STSTM

Provost s Learning Innovation Grant for

Interpreter Preparation (IPP) IPP 101 ASL/Non-IPP Majors. 4 Hours. Prerequisites: None. 4 hours weekly (3-1)

An Evaluation of RGB-D Skeleton Tracking for Use in Large Vocabulary Complex Gesture Recognition

Interpreter Preparation (IPP) IPP 101 ASL/Non-IPP Majors. 4 Hours. Prerequisites: None. 4 hours weekly (3-1)

Can Generic Neural Networks Estimate Numerosity Like Humans?

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

Exploring the Structure and Function of Brain Networks

SignInstructor: An Effective Tool for Sign Language Vocabulary Learning

DEEP LEARNING BASED VISION-TO-LANGUAGE APPLICATIONS: CAPTIONING OF PHOTO STREAMS, VIDEOS, AND ONLINE POSTS

A Novel Capsule Neural Network Based Model For Drowsiness Detection Using Electroencephalography Signals

Detection and Recognition of Sign Language Protocol using Motion Sensing Device

CHARACTERISTICS OF STUDENTS WHO ARE: DEAF OR HARD OF HEARING

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks

Your Next Personal Trainer: Instant Evaluation of Exercise Form Brandon Garcia, Russell Kaplan, Aditya Viswanathan

A Deep Learning Approach to Identify Diabetes

Automatic Irish Sign Language Recognition

Object recognition and hierarchical computation

Quality Assessment of Human Hand Posture Recognition System Er. ManjinderKaur M.Tech Scholar GIMET Amritsar, Department of CSE

Hierarchical Convolutional Features for Visual Tracking

Two Themes. MobileASL: Making Cell Phones Accessible to the Deaf Community. Our goal: Challenges: Current Technology for Deaf People (text) ASL

PIB Ch. 18 Sequence Memory for Prediction, Inference, and Behavior. Jeff Hawkins, Dileep George, and Jamie Niemasik Presented by Jiseob Kim

Characterization of 3D Gestural Data on Sign Language by Extraction of Joint Kinematics

An approach for Brazilian Sign Language (BSL) recognition based on facial expression and k-nn classifier

Deep Learning for Computer Vision

Neuromorphic convolutional recurrent neural network for road safety or safety near the road

CS-E Deep Learning Session 4: Convolutional Networks

Convolutional Neural Networks for Text Classification

arxiv: v1 [cs.hc] 20 Feb 2014

Externalization of Cognition: from local brains to the Global Brain. Clément Vidal, Global Brain Institute

Recognizing American Sign Language Gestures from within Continuous Videos

A Real-time Gesture Recognition System for Isolated Swedish Sign Language Signs

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Edge Based Grid Super-Imposition for Crowd Emotion Recognition

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

HOW AI WILL IMPACT SUBTITLE PRODUCTION

Multimodal monitoring of Parkinson s and Alzheimer s patients using the ICT4LIFE platform

Neuro-Inspired Statistical. Rensselaer Polytechnic Institute National Science Foundation

REPORT. PROJECT: TRANS2WORK - School-to-Work Transition for Higher education students with disabilities in Serbia, Bosnia & Herzegovina and Montenegro

Available online at ScienceDirect. Procedia Technology 24 (2016 )

A Cascaded Speech to Arabic Sign Language Machine Translator using Adaptation

Multi-attention Guided Activation Propagation in CNNs

A Randomized- Controlled Trial of Foundations for Literacy

MULTI-CHANNEL COMMUNICATION

Sign Language Recognition using Convolutional Neural Networks

Hand Gesture Recognition: Sign to Voice System (S2V)

Indian Sign Language Alpha-Numeric Character Classification using Neural Network

ABSTRACT I. INTRODUCTION

Lions Hearing Center Of Michigan & Greater Metro Detroit Lions Club Deborah Love-Peel Scholarship For Deaf / Hard of Hearing Students

Member 1 Member 2 Member 3 Member 4 Full Name Krithee Sirisith Pichai Sodsai Thanasunn

Learning Period 3: 10/28-11/22

Assessment: Course Four Column SPRING/SUMMER 2015

Collaborative Project of the 7th Framework Programme. WP6: Tools for bio-researchers and clinicians

A Comparison of Deaf, Hard of Hearing, and Hearing Young Adults Responses to a Health Risk Behavior Survey

Glossary of Inclusion Terminology

Computational Cognitive Neuroscience

Transcription:

DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation Biyi Fang Michigan State University ACM SenSys 17 Nov 6 th, 2017 Biyi Fang (MSU) Jillian Co (MSU) Mi Zhang (MSU) 1

Deep Learning is Changing our Lives Now Self-Driving Face Recognition Speech Recognition Play Go 2

Background American Sign Language (ASL) is the primary language used by deaf people to communicate with others. Unfortunately, very few people with normal hearing understand sign language. Existing communication approaches have key limitations in cost, availability or convenience. Sign Language Interpreter Write on Paper Type on Phone 3

Sign Language Translation Technology A S L Characteristics of Signs Hand Shape Hand Movement Relative Location of Two Hands Sensors Computational Models 4

Limitations of Existing Sign Language Translation Systems EMG + Motion [Wu et al. 2015] RGB Camera [Zafrulla et al. 2010] Kinect [Chai et al. 2013] intrusive constrained by lighting condition and privacy intrusive lack of resolution 5

Our Solution: DeepASL A deep learning-based sign language translation framework that enables ubiquitous and non-intrusive ASL translation at both word and sentence levels. 6

Leap Motion (Infrared Sensing) Design Choice 3D Skeleton Joint Data Skeleton Joint Bone Extended Bone Elbow 7

Comparison with Existing Sign Language Translation Systems Non- Intrusive Lighting Condition Privacy Preserving High Resolution EMG + Motion RGB Camera Kinect DeepASL 8

System Architecture of DeepASL Sentence-Level Translation Word-Level Translation ASL Characteristics Extraction 9

ASL Characteristics Extraction Hand Shape + Relative Location of Two Hands Right Hand Shape Hand Movement [0, 0, 0] Right Hand Movement Left Hand Shape Left Hand Movement 10

ASL Characteristics Organization Right Hand Shape Right Hand Movement Left Hand Shape Thank Left Hand Movement Fully Connected Softmax Low-Level ASL Characteristics Mid-Level Right/Left Hand Representation High-Level Single-Sign Representation Probability Distribution over Vocabulary 11

Similar ASL Differentiation Some signs share very similar characteristics at the beginning of their trajectories. Want What 12

Similar ASL Differentiation A bidirectional recurrent neural network (B-RNN) model is incorporated to capture both forward and backward representation of a sign. Output Layer y t 1 y t y t+1 Backward Layer h t 1 h t h t+1 Forward Layer h t 1 h t h t+1 Input Layer x t 1 x t x t+1 13

Sentence-Level ASL Translation DeepASL adopts a probabilistic framework based on Connectionist Temporal Classification (CTC) [Graves et al. 2006] for sentence-level ASL translation. @ Training @ Inference How are you How_are_you How are you Insert blank symbols Remove blank symbols It eliminates the restriction of pre-segmenting the whole sentence into individual words, enabling end-to-end whole-sentence translation. 14

Performance on Word-Level ASL Translation ASL Word Dataset 56 ASL words 11 participants 6440 samples In total Performance Average 95% accuracy Worst-case 91% on participant #11 15

Necessity of Model Components Model Translation Accuracy Increase Note Baseline 1 89.4 ± 3.1 % 5.1 % No hand shape information Baseline 2 89.5 ± 2.4 % 5.0 % No hand movement information Baseline 3 91.1 ± 3.4 % 3.4 % No hierarchical structure Baseline 4 93.7 ± 1.7 % 0.8 % No bidirectional structure DeepASL 94.5 ± 2.4 % 16

Performance on Sentence-Level ASL Translation ASL Sentence Dataset 4-word sentence from 16 ASL words 100 sentences 866 samples in total Performance Average 16% Top-1 word error rate (WER) Average 4% Top-5 WER 17

Application#1: ASL Tutor ASL Tutor helps hearing parents of deaf children learn ASL. MyASLTutor Looked-up Word & Explanation ASL Visualization 18

Application#2: ASL Interpreter ASL Interpreter enables two-way communication between deaf and hearing majority. Deaf Person First-person point of view of the deaf person using Microsoft HoloLens AR headset 19

Video: https://www.youtube.com/watch?v=0pmjnnnn77c

Conclusions DeepASL represents the first deep learning-based sign language translation framework that enables ubiquitous and non-intrusive ASL translation at both word and sentence levels. DeepASL achieves an average 94.5% translation accuracy over 56 commonly used ASL words, and an average 16.1% word error rate on translating 100 sentences. Take an initiative on ASL sign data crowdsourcing. We believe that, with the crowdsourced efforts, ASL translation technology can be significantly advanced. 21

Thank You Biyi Fang Michigan State University fangbiyi@msu.edu Web: fangbiyi.com 22