ASL Gesture Recognition Using. a Leap Motion Controller

Similar documents
Detection and Recognition of Sign Language Protocol using Motion Sensing Device

Voluntary Product Accessibility Template (VPAT)

VPAT Summary. VPAT Details. Section Telecommunications Products - Detail. Date: October 8, 2014 Name of Product: BladeCenter HS23

University of Toronto. Final Report. myacl. Student: Alaa Abdulaal Pirave Eahalaivan Nirtal Shah. Professor: Jonathan Rose

Avaya IP Office R9.1 Avaya one-x Portal Call Assistant Voluntary Product Accessibility Template (VPAT)

ITU-T. FG AVA TR Version 1.0 (10/2013) Part 3: Using audiovisual media A taxonomy of participation

AVR Based Gesture Vocalizer Using Speech Synthesizer IC

HearIntelligence by HANSATON. Intelligent hearing means natural hearing.

LIBRE-LIBRAS: A Tool with a Free-Hands Approach to Assist LIBRAS Translation and Learning

Summary Table Voluntary Product Accessibility Template

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

Sign Language Interpretation Using Pseudo Glove

Avaya 3904 Digital Deskphone Voluntary Product Accessibility Template (VPAT)

Date: April 19, 2017 Name of Product: Cisco Spark Board Contact for more information:

Fujitsu LifeBook T Series TabletPC Voluntary Product Accessibility Template

Avaya one-x Communicator for Mac OS X R2.0 Voluntary Product Accessibility Template (VPAT)

Recognition of sign language gestures using neural networks

Note: This document describes normal operational functionality. It does not include maintenance and troubleshooting procedures.

Summary Table Voluntary Product Accessibility Template. Supports. Please refer to. Supports. Please refer to

Glossary of Inclusion Terminology

Real Time Sign Language Processing System

Real-time Communication System for the Deaf and Dumb

An Approach to Hand Gesture Recognition for Devanagari Sign Language using Image Processing Tool Box

Gesture Recognition using Marathi/Hindi Alphabet

Voluntary Product Accessibility Template (VPAT)

VPAT for Apple MacBook Air (mid 2013)

Apple emac. Standards Subpart Software applications and operating systems. Subpart B -- Technical Standards

Konftel 300Mx. Voluntary Product Accessibility Template (VPAT)

Summary Table Voluntary Product Accessibility Template. Supporting Features. Supports. Supports. Supports. Supports

Getting the Design Right Daniel Luna, Mackenzie Miller, Saloni Parikh, Ben Tebbs

Avaya G450 Branch Gateway, Release 7.1 Voluntary Product Accessibility Template (VPAT)

Voluntary Product Accessibility Template (VPAT)

Getting Started with AAC

iclicker+ Student Remote Voluntary Product Accessibility Template (VPAT)

User Manual Verizon Wireless. All Rights Reserved. verizonwireless.com OM2260VW

Empirical Research Methods for Human-Computer Interaction. I. Scott MacKenzie Steven J. Castellucci

enterface 13 Kinect-Sign João Manuel Ferreira Gameiro Project Proposal for enterface 13

Hand Gesture Recognition and Speech Conversion for Deaf and Dumb using Feature Extraction

itracker Word Count: 1453 Apper Contribution: 352 ECE Creative Applications for Mobile Devices For: Professor Jonathan Rose April 12, 2013

The Use of Voice Recognition and Speech Command Technology as an Assistive Interface for ICT in Public Spaces.

Sign Language to English (Slate8)

Product Model #:ASTRO Digital Spectra Consolette W7 Models (Local Control)

Appendix C Protocol for the Use of the Scribe Accommodation and for Transcribing Student Responses

Product Model #: Digital Portable Radio XTS 5000 (Std / Rugged / Secure / Type )

The power to connect us ALL.

The Leap Motion controller: A view on sign language

User Guide V: 3.0, August 2017

Motion Control for Social Behaviours

Avaya B189 Conference Telephone Voluntary Product Accessibility Template (VPAT)

Communication Access Features on Apple devices

iclicker2 Student Remote Voluntary Product Accessibility Template (VPAT)

Avaya IP Office 10.1 Telecommunication Functions

Avaya B179 Conference Telephone Voluntary Product Accessibility Template (VPAT)

Communications Accessibility with Avaya IP Office

Interact-AS. Use handwriting, typing and/or speech input. The most recently spoken phrase is shown in the top box

Avaya B159 Conference Telephone Voluntary Product Accessibility Template (VPAT)

Sign Language in the Intelligent Sensory Environment

Member 1 Member 2 Member 3 Member 4 Full Name Krithee Sirisith Pichai Sodsai Thanasunn

Errol Davis Director of Research and Development Sound Linked Data Inc. Erik Arisholm Lead Engineer Sound Linked Data Inc.

THE GALLAUDET DICTIONARY OF AMERICAN SIGN LANGUAGE FROM HARRIS COMMUNICATIONS

Analysis of Recognition System of Japanese Sign Language using 3D Image Sensor

Multimodal Interaction for Users with Autism in a 3D Educational Environment

Meri Awaaz Smart Glove Learning Assistant for Mute Students and teachers

Glove for Gesture Recognition using Flex Sensor

Before the Department of Transportation, Office of the Secretary Washington, D.C

myphonak app User Guide

A Communication tool, Mobile Application Arabic & American Sign Languages (ARSL) Sign Language (ASL) as part of Teaching and Learning

Available online at ScienceDirect. Procedia Technology 24 (2016 )

Cisco Unified Communications: Bringing Innovation to Accessibility

Making Sure People with Communication Disabilities Get the Message

Voluntary Product Accessibility Template Summary Table

A Review on Gesture Vocalizer

Recognition of Hand Gestures by ASL

As of: 01/10/2006 the HP Designjet 4500 Stacker addresses the Section 508 standards as described in the chart below.

Situation Reaction Detection Using Eye Gaze And Pulse Analysis

Applications to Assist Students with. ASD: Promoting Quality of Life. Jessica Hessel The University of Akron. Isabel Sestito The University of Akron

Tele-audiology. Deviceware. Integration with Resonance Portable Diagnostic Devices. White Paper (v1.1)

National Relay Service: The Deaf Perspective DISCUSSION PAPER

Note: This document describes normal operational functionality. It does not include maintenance and troubleshooting procedures.

Running Head: Overcoming Social and Communication Barriers 1

Posture Monitor. User Manual. Includes setup, guidelines and troubleshooting information for your Posture Monitor App

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Accessing the "Far World": A New Age of Connectivity for Hearing Aids by George Lindley, PhD, AuD


Microphone Input LED Display T-shirt

Summary Table Voluntary Product Accessibility Template. Supporting Features Not Applicable Not Applicable. Supports with Exceptions.

VPAT. Voluntary Product Accessibility Template. Version 1.3

On Demand Video Remote Interpreting

Hand of Hope. For hand rehabilitation. Member of Vincent Medical Holdings Limited

Multimedia courses generator for hearing impaired

Impression-Making Designed for Dentists

Question & Answer (Q&A): Identifying and Meeting the Language Preferences of Health Plan Members

Building an Application for Learning the Finger Alphabet of Swiss German Sign Language through Use of the Kinect

Haptic Based Sign Language Interpreter

Sign Language Recognition using Webcams

Using Deep Convolutional Networks for Gesture Recognition in American Sign Language

Clay Tablet Connector for hybris. User Guide. Version 1.5.0

Smart Gloves for Hand Gesture Recognition and Translation into Text and Audio

Glossary of Disability-Related Terms

Instructor Guide to EHR Go

Transcription:

ASL Gesture Recognition Using a Leap Motion Controller Carleton University COMP 4905 Martin Gingras Dr. Dwight Deugo Wednesday, July 22, 2015

1 Abstract Individuals forced to use American Sign Language as a primary method to communicate are trapped by a language barrier worse than any other. ASL users, despite any quantity of time dedicated to learning the language, are unable to train everyone they interact with how to understand them. The recent boom of new innovations in technology, particularly in this study the Leap Motion controller, an IR sensor capable of detecting hands within it s field of view, represent a real potential to address this issue. Furthermore, the miniaturization of computers has made the portability of a solution based around this technology feasible. This research attempts to find attributes captured by the Leap Motion controller that are able to distinguish individual hand gestures in order to demonstrate the capability of potentially parsing and presenting a real time perspective of gestures being signed. The process improved my understanding of the Leap Motion API, how to communicate with it, and create an application powered by that data. The application that resulted from this study, Leap ASL, showcases the use case for this technology as well as the potential that is available within this domain.

2 Acknowledgments First and foremost I would like to acknowledge the team at Leap Motion for developing the controller that I used for this study, without which it would have been impossible. Their advances in the domain of detecting objects, in 3D space, using infrared sensors shows great promise and made my experiments possible. As their project advances there are boundless possibilities that will be exposed. Furthermore their contributions to the open source community surrounding their technology significantly simplifies the invention of new applications, without which would make development in this field prohibitively difficult. I would like to thank Dr. Dwight Deugo for agreeing to be my supervisor for this research. His assistance in selecting an initial topic and encouragement on maintaining a narrow focus in my study proved invaluable. By keeping the scope of this investigation narrow I was able to achieve everything that I d intended to and see some intriguing results. Lastly I would like to thank the entire Computer Science department at Carleton University for the last five years of instruction and guidance that allowed me to advance both personally and professionally into the developer I am today.

3 Table of Contents Abstract 1 Acknowledgments 2 Table of Contents 3 List of Figures 4 Introduction 5 American Sign Language 6 Leap Motion Controller 8 Leap Motion Developer Community 9 Leap ASL 10 Leap ASL Capture Loop 11 Leap ASL Hand and Finger Objects 12 Approach 13 Results 15 Future work 16 Conclusion 18 References 19

4 List of Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 The letters A and S in American Sign Leap ASL in action showing a hand being recorded by the leap motion controller The prompt to record a letter The recorded letter and hash associated with it The letter detected and displayed based on the hand s positioning 7 11 13 14 15

5 Introduction Hearing and speaking impairment, and the reliance on sign language for communication creates an immense barrier for many individuals. ASL users are not able to teach those they encounter how to sign, meaning that, unlike other language barriers there is no way to improve your ability convey a message. With the immense advances in technology in recent decades this problem is one where it should be possible to improve immensely and create a meaningful impact on millions of peoples lives. The motivation behind this investigation is to discover if there are tools available today that possess the potential to making headway in this field. Since the invention of the computer, in particular the advent of the personal computer, there have been dozens of input devices that have been experimented with. 1 However, the keyboard and mouse combination has remained the predominant tool. This has proven to be very functional for stationary individuals focused on productivity. However, when it comes to mobility, this combination falls short. Recent research and development into this problem domain have resulted in new technologies for individuals to interact with their devices. Leap Motion is one company that took an active interest in this topic and invented the Leap Motion Controller, what may be the next generation of interacting with computers. The Leap Motion controller uses infrared sensors to detect objects and items directly in front of it. 2 So far most applications developed with this technology have focused on the gamification of the interface or the creation of 1 2 http://students.asl.ethz.ch/upl_pdf/358-report.pdf http://www.ncbi.nlm.nih.gov/pmc/articles/pmc3690061/

6 productivity tools. 3 When analyzing this tool however, it was observed that another interesting application was possible. By detecting and analyzing hands in the field of view, their position, pitch, direction, it may be possible to match them to American Sign Language hand positions and gestures. If it were possible to interpret these hand positions it may be possible to provide a tool to interpret and translate them in real-time. The goal of this investigation is to discover if it is possible to use the Leap Motion controller to detect hand positioning accurately and precisely enough so that it s promise and or shortcomings are exposed. The resulting project should be simple and straightforward to use and clearly demonstrate the capabilities of the matching algorithm used to detects signs. American Sign Language American sign language is the predominant sign language of the deaf communities in the United States and Canada. It is composed of both static and dynamic gestures that can involve the hands, face, and torso in order to convey either a letter or word. 4 This introduces an extraordinary level of complexity when trying to interpret letters and words in real time. In order to keep the scope narrow, this paper will generally ignore signs 3 4 http://www.cnet.com/news/leap-motion-3d-hands-free-motion-control-unbound/ http://www.gallaudet.edu/images/clerc/pdf/full%20document%20of%20asdc%20sign %20Language%20for%20All-English.pdf

7 requiring any sort of motion since that would require the aggregation multiple frames of hand motion being captured accurately which proved to be a non-trivial issue to overcome. Focusing purely on static gestures proved to be very difficult on its own. Due to the subtle nuances of the language, as depicted in the figure on the right, the slight Figure 1 - The letters A and S in American Sign Language change in position in the thumb is difficult for a digital device to detect. This issue is quite common in american sign language. There are a plethora of words to try to communicate so these small nuances, noticeable by a human user, are quite common, however present a massive barrier for software attempting to interpret the users intentions. Even among ASL users, variations between dialects of signs are found around the world.5 This adds an additional barrier for users since they cannot reliable expect to be able to communicate even with those versed in the same communication medium as them. By creating a common tool to translate ASL it would inevitably encourage a 5 http://www.ethnologue.com/language/ase

8 common set of hand positions and gestures. This uniformity is an unintended benefit of the creation of a real-time interpreter of ASL. Leap Motion Controller The Leap Motion Controller is a peripheral USB device with two monochromatic IR cameras and three infrared LEDs designed to be used face up a desk. 6 The device sends hundreds of frames per second through the USB cable to a host computer were Leap Motions software converts the two dimensional images, consisting of a roughly three and a quarter feet hemispherical area, into three dimension positional data. 7 This allows you to convert objects in the three dimensional space near your computer into numeric interpretations allows you to perform mathematical operations on them. The Leap Motion controller can detect hands and items held in hands within its field of view and based on these detections a publishes additional details about what is in its field of view to improve the experience developing using this tool. The data is published through the Leap Motion application programming interface, or API. The API allows developers to communicate with the controller and consume the interpreted data published by Leap Motion s software. 8 Additionally, the Leap Motion team has created 6 7 8 http://www.ncbi.nlm.nih.gov/pmc/articles/pmc3690061/ http://www.ncbi.nlm.nih.gov/pmc/articles/pmc3690061/ https://developer.leapmotion.com/

9 software developer kits, or SDKs in multiple programming languages that abstract the three dimensional view that the controller consumes and presents it in a well documented set of models for developers and their applications to interact with. For the Leap Motion ASL tool I used the JavaScript SDK. The reason for this decision is primarily due to the portability of JavaScript. JavaScript can be run it almost every web browser in the world, as well as a server side language in the form of NodeJS. This portability made it very appealing since no matter how future iterations of this application may progress the initial investigative efforts of this study will still be very applicable. Early stages of this project required a large amount of time experimenting with configuring and connecting to the leap motion controller. Once configured communicating with the API and experimenting with the Leap Motion s various SDKs to determine what data and features were available was another area that required time and energy. Early experiments had little to do with my final goal and were only to familiarize myself with the tool. Leap Motion Developer Community Leap Motion has, since day one, nurtured a strong developer community. They have done so by open sourcing a ton of sample as well as full applications. 9 These sample 9 http://github.com/leapmotion/

10 applications provide a phenomenal resource for quickly iterate upon. Using these resources I was able to quickly add a graphical display of what the leap motion controller was detecting to my application without dedicating a significant amount of time to this ancillary activity. 10 The large quantity of samples and documentation drastically improved the stage of this investigation involving becoming familiar with the Leap Motion controller, it s API, and the SDK s required to be used to interact with the controller. Part of the goals of this project was to in turn contribute back to the leap motion community and potentially provide a resource for others to use in their own applications. The code for my application will be published on GitHub where it is easily accessed, can be forked, and improved upon by others. Leap ASL 10 Leap ASL is the application developed to investigate the opportunity that I perceived in regards to this latest technology developed and the societal benefits it could theoretically represent. It is a simple stateless web app that relies purely on client side code, rather than relying on a back end server to communicate with. The application interfaces with the leap motion controller through a JavaScript API using web sockets to communicate at a phenomenal rate. It pulls the interpreted three dimensional data perceived from the controller allowing both visualization and manipulation. One of the open source solutions from the Leap Motion open source community, leapjs-riggedhttp://github.com/leapmotion/leapjs-rigged-hand

11 hand, graphically displays a visualization of the three dimensional interpretation of the users hand. This visual feedback allows users to clearly see what the information the application will capture when recording hand positions. The design of the web app is intentionally simple to emphasize function over form for this iteration. A screenshot of the application in use depicted in Figure 2 below. Figure 2 - Leap ASL in action showing a hand being recorded by the leap motion controller Leap ASL Capture Loop Leap Motion s proprietary software broadcasts the current state and view from the controller to all subscribed JavaScript listeners over the WebSocket protocol. This loop is accessible by connecting callback to the Leap object s loop property. The callback

12 then gets called with a Frame object as a parameter every time that the proprietary software running on the local machine broadcasts. Leap ASL captures the data by attaching a callback to the loop property and capturing incoming frames and interpreting the data contained therein. The Frame object is an abstracted object containing information regarding the current state of the ambient area surrounding the controller. This data contains tons of information including a history of objects detected by the controller, recognizable gestures being performed, translations objects in the frame have performed since a specified frame id in the past, and, most importantly for this application, any hand objects detected within it s field of view. Leap ASL Hand and Finger Objects The hand object passed by Leap Motions API contains yet another load of data to be interpreted. Leap Motion detects a large set of properties about the position and motion of hands in it s view. These properties include information such as whether it is the left or right hand, the hand s grab strength, the hands velocity, and many more. The properties there were the most important to this study were the hand s pitch, roll, and yaw. These values were added in later iterations of the program to improve the ability to distinguish between between hand positions similar but facing different directions. The initial approach used to try to distinguish between hand positions were the software s interpretation of the fingers.

13 From the fingers you can pull data such as the bones and joints that compose them, the direction they re pointing, and the position of the tip, and other such data. By iterating over the fingers and combining their data, you can essentially get a snapshot of a three dimensional view of your hand and fingers at a given moment. This concept is fundamental in order to be able to later compare a hand position to a prior one. Approach For the first iteration of Leap ASL I built out a simple capture mechanism that, when the user pressed the spacebar would prompt the user to enter a letter. The application would then store the whole hand object, with the associated letter in it within an array. When the user pressed another key the program would iterate over the array of past recorded letters and compare the current hand state to it. While this approach in theory has potential, the reality is that the granularity, Figure 3 - The prompt to record a letter in terms of decimal recorded in all the properties of a hand was way too precise to ever match when recreating the hand position. Furthermore, having to press a button to compare the current hand position to recorded positions increased the length of time in

14 the feedback loop drastically, introducing further difficulty in matching a recorded hand position. Resolving the issue regarding the slow feedback loop had a cascading benefit of also addressing the other issue discovered. By hooking into the callback loop triggered by the Leap Motion API on every frame recorded Leap ASL could automatically take any hand objects in the controller s field of view and compare them against recorded values. At this point the inefficiency of iterating over an array became noticeable as the frame rate of the rendered view of the hand deteriorated. The solution discovered was to use a hash table to store the recorded values and a hashing function that would create a hash from the hand and finger data. This could be used both to store a snapshot of the hand and to look it really quickly since hash table look ups are incredibly quick. Using a hash table required that key properties of the hand and finger objects were selected. When storing properties of the hand and fingers it was necessary to round to a single decimal place because of the granularity that the Leap Motion Figure 4 - The recorded letter and hash associated with it controller publishes.

15 Results The second approach worked much better. Successive attempts of previously recorded positions were recognized by the application. Yet the granularity of data on a hand position was still so accurate that it was challenging to get results. In order to address this issue, a fudge factor was introduced. This function assessed two hashes and compared their properties and as long as the overall difference between values was less than x and no single value was more than y off of their recorded value it was a match. By modifying x and y it was possible to tweak the required accuracy of the matching algorithm significantly. Additional improvements came when experimenting with the controllers properties regarding it s orientation. By placing it on it s side and enabling head mounted display mode improved the analysis of the hand and fingers position because a users wrist wouldn't be obscuring their position. As Leap Motion s software for hand and finger detection improves, this application would inherently benefit from all of the effort and development into the SDK. Figure 5 - The letter detected and displayed based on the hand s positioning

16 The results are positive and indicative of potential of incredible advances within this domain. Though the scope of the investigation was quite narrow, they allowed rapid progress that wouldn't be possible if focusing on the broader more complex implications of this issue. The issues that would have been required to be addressed if the scope was allowed to creep would require significant investigative effort and development time, unnecessary for this early proof of concept. Future work A fundamental issue with the vision proposed is that individuals are different, sign language is partially subjective and therefore one dataset for everyone is ineffective. In order to avoid issues raised by the difference in individuals a possible approach would be to have a learning stage for the application where you repeated a set of gestures to calibrate the software to your specific gestures. Additionally, by adding two buttons to the device that indicated whether the interpreted word or gesture is correct, a feedback loop is introduced where the application can learn from its success and failure. This learning would improve the platform for everyone on it while. Gestures represent another area of potential problems because the hand position as it goes through the gesture would have to be able to both be recorded then retrieved later. An approach for this that would be to store each frame in a gesture as a hand position, however flag it as part of a gesture and have a pointer to the hand position before it.

17 Then if a hand position is part of a gesture you add it to a stack of frames that would represent the ongoing gesture and, as the gesture is completed compare the frames in it s stack to the recorded objects. You would then have to decide on a confidence interval for how similar individual frames have to be to recorded ones, how many frames wrong, and how wrong, while still considering the gesture valid, and how many frames could be missing due to inconsistencies in how frequent or at what part of the gesture a frame is recorded. The successes of this experiment begs one to think to where this sort of application could go. The requirements to run the application are very small, only needing a basic computer with a usb port and the Leap Motion software running on it. It paired in a client server architecture where the heavy computing was offloaded to a remote server could essentially miniaturize the components creating a tool to parse gestures in real time as a user signed. Aggregating those results then rendering them through a text to speech software and playing the parsed letters could essentially provide an translator for individuals who can only sign to communicate with others who haven t learnt to sign yet. Furthermore, by adding a microphone to this device, it could listen and interpret spoken word and display a video of the sign language equivalent on an attached screen. This would allow bi-directional communication for both mute and deaf individuals in realtime. Improving the hand position analysis would be possible by focusing on individual letters, words, and phrases and discovering which key attributes truly matter for them, whether it is only some fingers being a certain way or the hand direction being exactly right; by

18 narrowing the number of attributes to a smaller subset, would allow for easier matching while likely improving the accuracy of the matches. As improvements were made to how hand positions and gestures are interpreted they could be A/B tested on subsets of the user base and their relative improvements quantified. Conclusion The use of the Leap Motion Controller to detect hand position in 3D space and interpret it as a ASL gesture shows immense promise. This success on a small scale could be replicated and drastically improved upon by a larger effort. The potential it represents, to change tons of peoples lives, would be truly incredible if realized. To be able to translate gestures in real-time to voice and then have verbal responses played back as a video of an individual signing would inevitable change the way deaf and mute individuals were able to interact with everyone else. This investigation achieved everything it set out to do and lays the groundwork for others to delve deeper into the possibilities that this idea presents.

19 References Garbani, L., Siegwart, R., & Pradalier, C. (2011, December 1). History of Computer Pointing Input Devices - ETH Z. Retrieved August 16, 2015. Leap Motion SDK. (n.d.). Retrieved August 16, 2015, from https:// developer.leapmotion.com/ Leap Motion. (n.d.). Retrieved August 16, 2015, from http://github.com/leapmotion/ Leap Motion: 3D hands-free motion control, unbound. (n.d.). Retrieved August 16, 2015, from http://www.cnet.com/news/leap-motion-3d-hands-free-motion-control-unbound/ Leapmotion/leapjs-rigged-hand. (n.d.). Retrieved August 16, 2015, from http:// github.com/leapmotion/leapjs-rigged-hand Weichert, F., Bachmann, D., Rudak, B., & Fisseler, D. (n.d.). Analysis of the Accuracy and Robustness of the Leap Motion Controller. Sensors, 6380-6393.