On the omputational aspets of sign language reognition Christian Vogler Overview Problem statement Basi probabilisti framework Reognition of multiple hannels Reognition features Disussion Gallaudet Researh Institute 2 What is ASL reognition? What makes it hard? Signer 3D features of body parts Reognition of signs from features Video image to omputer Extration of 3D features of arms, hands, body, and fae Output of transription in human-readable form Automati translation to English Sign language reognition is hard: Language modeling issues Computational issues How does reognition atually work? How to represent data? 3 4
Basi modeling priniples We break down signs into phonemes We model handshape and hand movements in independent hannels This helps deal with the omplexities of sign languages odeling example Example: BROTHER with independent hannels Strong hand loation/veloity handshape H forehead down H neutral A A->L L Weak hand loation/veloity H neutral handshape A A->L L 5 6 Overview Problem statement Basi probabilisti framework Reognition of multiple hannels Reognition features Disussion Basi Reognition Priniples We start with the data signal In our ase: Colletion of 3D positions of hands 3x3 orientation matries of hands Joint angles of fingers Abdution (spread) angles of fingers The first problem: Variability No two instanes of a sign look exatly alike 7 8
Variability 3 repeated instanes of sign for FATHER Struturally similar But details are different 9 Temporal variations Signers an sign faster or more slowly Variations in when exatly eah movement ours Speeh reognition uses hidden arkov models (Hs) to deal with them 2
Hidden arkov models H example State-based statistial model System is always in some state At disrete time intervals, takes transition to another state Probabilisti transitions Aounts for temporal variations Eah state has output probability distribution Here: Gaussian density mixture Aounts for spatial variations Y X 3 4 How to use Hs Hs an generate a signal For reognition, onsider the onverse: What is the probability that the H generated signal? Whih state sequene generated it? Answer to these questions defines ontinuous reognition problem H probabilities are trained Continuous reognition Chain Hs into network ath network to signal Find most likely state sequene through network Father Woman Try Read Book Teah 5 6
Continuous reognition Continuous reognition X Front of Forehead straight bak straight forward straight bak H Forehead X Front of Forehead straight bak straight forward straight bak H Forehead Father Try Book Woman Read Teah 7 8 Token passing algorithm Overview Father Woman Try Read Book Teah Problem statement Basi probabilisti framework Reognition of multiple hannels Reognition features Disussion 9 20
Token passing Formally The token passing algorithm finds a Qi b Qi O t Q,..,Q t Q i P t a Qi O, Q Q i b Qi O i Gaussian density funtion in state Qi Data signal at time t Q i Transition probability from state to state Alternative form We prefer to write the formula in logarithmi form instead Easier to manipulate Faster to ompute Q P O, Q Q,..,Q t i t log a Qi Q,..,Q t i t Q i b Qi a Qi O i Q i b Qi O i 2 22 Important point So far, just like speeh reognition... But how do we add multiple hannels? The reognition algorithm imizes a produt of independent random variables Eah data frame is independent from the others Eah state transition is independent from the others So: Joint probability of hannels is just produt of marginal (individual hannel) probabilities Parallel Hs This extension formalizes Parallel Hs Instead of omputing: log P O, Q Q Compute over all hannels : Q,..,Q C log P O, Q Reall that log (ab) = log a log b 23 24
! $!! $!! & % Probability ombination Essentially, searhes an H network in parallel for eah hannel When to multiply probabilities? Split up sequene into weighted ontributions from individual signs Q Q,..,Q,..,Q W C w log P C w O log P, Q! "# O w, Q w! "# 25 Probability ombination What does this mean? We an ombine the partial probabilities after eah sign (or eah phoneme) Helps onstrain the searh through the parallel networks Another onstraint is needed: Paths from the hannels must be onsistent That is, they must touh the same sequene of signs 26 Channel ombination onstraints PaH token passing Enfore through path identifiers Assign unique path id to tokens Tokens have same path id iff they touh the same sequene of signs Combine only probabilities of tokens with same path id But then imum joint probability no longer imizes marginal probabilities Keep multiple hypotheses Woman Woman Try Like Try Read Teah Teah 27 28
* ( ) Summary Reognition algorithm is rather ompliated, ompared to basi speeh reognition algorithm Extra bookkeeping for path ids, onsisteny Still, reasonably effiient: O HC N 2 T Linear in number of hypotheses Linear in hannels, number of frames Quadrati in number of H states Overview Problem statement Basi probabilisti framework Reognition of multiple hannels Reognition features Disussion 29 30 Data representation Representation of the data is just as important as the reognition algorithm Intuition: ove up inh. Turn left 90 degrees. ove inh. Turn left 90 degrees. ove inh. Turn left 90 degrees. ove inh A square with a side of inh Both desriptions refer to the same objet Clearly, seond one easier to grasp Intuition, ontinued The first example is a loalized desription. No sense of the big piture We an infer big piture, beause we know the history of steps (move, turn, move, turn) Hs annot! Reall that suessive data frames are independent So Hs lose large parts of the history The seond desription shows the big piture 3 32
Loal features The omputational equivalent of loalized desriptions ( features ): 3D positions 3D veloities Finger joint angles Abdution angles We have these data, but none show what happens at a global level Global features Some global desriptions Trajetory along line, ar, plane,... Finger bent, extended Hand open, losed Problem: How do we translate these qualitative desriptions into numbers? 33 34
Global trajetory representation Compute 3x3 ovariane matrix of 3D positions over a window of 5-30 data frames Compute eigenvalues of matrix Largest two eigenvalues speify proportions of ovariane Line: st eigenvalue large, 2 nd small Plane: Both eigenvalues large Et... Handshape representation Take a lue from ASL linguistis here Handshape an be haraterized by degree of openness of a finger Open = finger is extended Closed = finger is bent How to represent with numbers? 37 38 Handshape features easure of openness of a finger Use height and base of quadrilateral Open: small height, large base Closed: the opposite Features: Summary For the final reognition features: Use mix of loal & global 3D positions, veloities, eigenvalues Degree of openness, abdution angles This is just the beginning There is muh potential for improvement in the data representation Speeh reognition is 20 years ahead of us in this respet 39 40
Overview Problem statement Basi probabilisti framework Reognition of multiple hannels Reognition features Disussion Experimental validation 500 signed ASL sentenes 22-sign voabulary Colleted with otionstar motion apture system (both hands), and yberglove (right hand only) Continuous sentenes, no pauses between signs 4 42 Feature omparison Do global features help? Yes! ovement hannel: 93% auray instead of 90% Handshape hannel: 95% auray instead of 83% Effet of multiple hannels PaH experiments with, 2, and 3 hannels Basi question: Is modeling the hannels independently from one another reasonable? Answer is important for large-sale systems 43 44
ultiple-hannel experiments Disussion Reognition auray in % 90 92 94 96 98 00 93.27 94.55 96.5 95.5 Baseline: hannel movement strong hand 2 hannels movement both hands 2 hannels movement strong hand handshape strong hand 3 hannels movement both hands handshape strong hand 2 hannels definite improvement 3 hannels did not yield improvement Possibly the data set is too small? No signs where the handshape is the only distinguishing feature ore work with larger data set is needed 45 46 For more information Christian.Vogler@gallaudet.edu http://gri.gallaudet.edu/~vogler/ 47