Requirements for Maintaining Web Access for Hearing-Impaired Individuals Daniel M. Berry 2003 Daniel M. Berry WSE 2001 Access for HI
Requirements for Maintaining Web Access for Hearing-Impaired Individuals Daniel M. Berry Computer Science Department University of Waterloo Waterloo, Ontario N2L 3G1, Canada Phone: None, use fax or e-mail FAX: +1-519-746-5422 dberry@uwaterloo.ca
Abstract The current textual and graphical interfaces to computing, including the Web, are a dream come true for the hearing impaired. However, improved technology for voice and audio interface threaten to end this dream. Requirements are identified for continued access to computing for the hearing impaired. Consideration is given also to improving access to the sight impaired.
I am Hearing Impaired (HI) I am profoundly hearing impaired (HI) since birth. I understand spoken language mainly by reading lips. Most profoundly HI individuals (HIIs) use signing instead of reading lips.
My audiogram: 0 X = Left Ear Daniel M. Berry s Audiogram O = Right Ear Hearing Loss in DB O 50 X O consonants X vowels O X 100 O X O X 125 250 500 1000 2000 4000 Frequency of Sound in Hertz
Handling Hearing Situations My situation is typical of many HIIs. I, and other HIIs, watch TV and movies only with captions or subtitles. I, and other HIIs, do not use the telephone
Handling Hearing Situations I, and other HIIs, use only written media for non-face-to-face communication with others, i.e., letters, TTY, fax, and especially e-mail. I, for one, do not give out a telephone number, giving only my fax number and e-mail address.
Things We Avoid Answering machines, voice mail, and voice directed menu selection are disasters for me and other HIIs that can use the phone a bit because there is no way to get confirmation that we have heard correctly.
More Things We Avoid Portable, wireless phones and cell phones have much poorer sound quality and more distortion than old-fashioned, indestructible, over-engineered, crystal clear Western Electric 500 telephones. All these modern devices are disenfranchising.
Computers and the WWW! Computers are the greatest for me and other HIIs. Everything is written! WWW is a dream world for me and other HIIs.
The End of the Dream is Nigh! However, the dream appears about to end! I see research in voice recognition, coming to fruition in applications,... aiming to make computers interact with humans by voice,... voice with no lipreadable lips!
The Future is Even Worse 250 years from now, on Federation Starships, e.g., Enterprise, communication with computers is completely by talking with it! Our dream is threatening to end!
Lack of Political Action We HIIs that could use the old strong, nondistorting, highly amplified Western Electric 500 telephones did not complain as these telephones disappeared to be replaced by weak, highly distorting, unamplified electronic telephones and cellular phones. We did not complain as new telephone services became less and less useful. That was a mistake!
Why We Did Not Complain We did not complain, probably, because at the same time, computers were becoming more and more ubiquitous and universal, and began to replace the telephone in our world. Moreover, e-mail is far easier to use than even the best non-distorting amplified telephone, simply because e-mail can always be read, while lipreading is difficult over any telephone. And, e-mail avoids phone tag!
Action is Needed Now! But now, we must do something to save the computer as our communication medium before it is too late. I therefore, propose ways to maintain access to computers and the WWW by HIIs.
The Sight Impaired I realize that current computers and the current WWW is abysmal for the sight impaired (SI), and that the SI individuals (SIIs) hope and pray for the day in which computers and the WWW can be accessed by voice interfaces. So, unless I am careful in my recommendations, I could end up suggesting maintenance of the current lack of access for the SIIs.
No Need for War There is no need for war between the HIIs and the SIIs. Thus, my recommendations consider the needs of the SIIs also.
Recommendations Consider first, at the highest level: output input Then consider some more details about output. That is all we have time for in this forum. Please read my paper for more detail. se.uwaterloo.ca/ dberry/ftp_site/reprints.journals.conferences/wse_paper.pdf
Output: When a computer speaks to a user, it should do so by both sound and text or picture, Voice and text should be synchronized to avoid cognitive interference.
Output: Added niceties: a lipreadable animated talking head animated sign language synchronized with the voice
www.comet-cartoons.com/toons/3ddocs/lipsync/lipsync.html
The Alphabet American Sign Language A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Want to see fingerspelling in action? Think you ve got it down? Try our quiz. where.com/scott.net/asl/abc.html
Input: When a computer accepts input from a user, it should accept both voice and text. You see, many HIIs do not speak well and consistently, and their poor speech confuses voice recognition algorithms.
Output Details: The details of the output are determined by the source of output: text real person s recorded voice live video
Text Text can drive synthesized voice, can drive animated lipreadable lips, can drive animated hand signing, and can be displayed as captions, even if it is in a phonetic alphabet.
Real Person s Voice When a real person s speaking is recorded, we should make a video of the person speaking for a lipreadable talking head; if the person is reading a script, then display the script synchronized with the voice, as in closed-captioning of prerecorded, scripted TV shows; if the person is not reading a script, then transcribe text to make captions, as in close-captioning of previously recorded, ad-libbed live events on TV.
Alive Video For alive video, use court-room style stenographers, as in real-time, slightly delayed closedcaptioning of live events on TV.
Options Of course, each user would set his or her preferred modes of input and output. However, all those described should be available to anyone.
Not Totally Biased to HIIS Lest you think that this textual preference is obviously totally biased in favor of HIIs, please consider these other recommendations offered by W3C!
Other Recommendations Recommendations of W3C Web Content Accessibility Guidelines 1.0 (1999): Aimed at all kinds of impairment, sight impairment, hearing impairment, and physical impairment.
Text is LCD For every non-textual artifact, provide an equivalent textual artifact. For a picture, an equivalent text artifact would be a textual description of what the user is expected to notice of the picture.
Text is Highly Versatile Technology is available for converting text into sound for SIIs and some physically impaired individuals (PIIs). Text itself is useful for HIIs and some PIIs.
Some Examples of RE Here are some examples of requirements engineering for some technology that is problematic for HIIs.
Picture Cell Phones Cell phones (and Telephones) with cameras and displays: The bandwidth of the cell network (and the phone lines) is not high enough for live motion video, with a refresh rate of 24 times/sec. All devices I know of opt to degrade the picture rather than the sound when there is not enough bandwidth.
Picture Cell Phones, Cont d For HIIs, the opposite choice is best; degrade the sound to keep video live motion. Lipreading is more important than hearing.
Picture Cell Phones, Cont d Ideal choice for all: allow the user to decide what is degraded; have slider to allow user to select degradations between two extremes: Sound Video Only Only
SpeechView Approach SpeechView (www.speechview.com) generates animated lipreadable lips locally at the cell phone. The software on the cell phone delays the sound, analyzes the sound, and displays on the screen, synchronized with the sound, animated lipreadable lips (like Michael Comet s).
SpeechView Approach, Cont d As an extra feature, to make SpeechView desirable even if and when bandwidth is high enough not to degrade either sound or video, the animated face gives extra clues to help disambiguate between phonemes that share visemes, e.g.: nose is red for m, cheeks are red for b, and lips are red for p.
Video Conferencing I attempted to take Steve Easterbrook s RE course offered by videoconferencing to several universities in Southern Ontario. The sound seemed OK for the hearing individual. The video used a lossy compression that does live action refreshing only where there is movement.
Video Conferencing, Cont d However, there is a delay of a few seconds after something starts moving before the algorithm notices the movement and starts the live action refreshing. In those few seconds, I lose the lipreading of the first few words of what someone says. When only Steve is lecturing, this is not a serious problem.
Video Conferencing, Cont d However when there is a discussion moving rapidly from one person to another, the loss amounts to a significan fraction of what is said. A better algorithm needs to be found, the bandwidth has to be increased to the point that compression is not needed, or the user should be able to choose to degrade the sound instead.
Video Conferencing, Cont d Moreover, the camera must be moved to face the person who is talking. When a person s contribution is brief, the camera may be moved to that person only after a few words have been said or not at all, again causing a signifant loss. Perhaps each person needs to have a camera mounted permanently in front of his or her face suspended from a crane mounted on his or her head.
Conclusion Text is LCD! Maintains access for HIIs. Enables access methods needed by SIIs. To set a good example, a text version of this paper is available at: se.uwaterloo.ca/ dberry/ftp_site/reprints.journals.conferences/wse_paper.txt