Human Computer Interaction
ERCIM News No.46, July 2001 [contents]

Improving Speech Recognition for Communication-oriented Activities

by Andrew Sears

Speech recognition (SR) can be a powerful tool, especially for individuals with physical impairments that limit their ability to use a traditional keyboard and mouse. While SR technology continues to improve, users still experience significant difficulty creating and editing documents. To address these difficulties, researchers at the Laboratory for Interactive Systems Design at University of Maryland (UMBC) and the IBM TJ Watson Research Center are investigating the processes by which users interact with SR systems, the difficulties users encounter, and techniques that will make SR more effective.

Our focus is on dictation-oriented activities (eg, writing email, letters, memos, or papers) as opposed to command-and-control activities (eg, turning lights on and off, answering the phone). Our goal is to improve the users’ experience as they interact with SR systems—allowing for greater productivity and increased satisfaction. This is accomplished by investigating how users interact with these systems, where they experience difficulties, why those difficulties occur, and the consequences they experience as a result of these difficulties.

Many researchers are investigating techniques to reduce the number of recognition errors, resulting in substantial improvements in the underlying SR algorithms. In contrast, we are interested in assisting users in correcting those recognition errors that still occur. Our initial study included individuals with high-level spinal cord injuries as well as traditional computer users with no physical impairments that hindered their ability to use a keyboard and mouse. Both groups of users completed a variety of dictation-oriented tasks using a state-of-the-art SR system. Our results confirmed that these individuals spent less than 35% of their time dictating and over 65% of their time correcting errors in the dictation. Of particular interest was the fact that these users spent over 32% of their time navigating from one location to another within the document. The results from this preliminary analysis provide strong support for the view that more effective navigation and error correction will support substantial increases in productivity.

A more detailed analysis of the difficulties users experienced while navigating within the documents they created revealed important patterns. Nearly 18% of all navigation commands fail. Over 99% of these commands fail due to recognition errors, users’ issuing invalid commands, or users’ pausing in the middle of issuing a command. Further, over 99% of these failures lead to one of three results: modifying the content of the document, moving the cursor to the wrong location, or no changes at all. By understanding the underlying reasons for these failures, and the consequences users experience as a result of failed commands, we can prioritize future efforts. Our initial efforts to make navigation more effective have included modifying the way commands are processed and changing the commands that are available to users. We have evaluated these changes through a longitudinal study involving fifteen participants. A preliminary analysis of the results suggests a strong connection between navigational-efficiency and productivity. A detailed analysis of the results from this study is underway.

While SR algorithms continue to improve, new applications will be found that involve more complex tasks, noisy environments, and users with more diverse speaking patterns. As a result, we believe that recognition errors will continue to be a significant problem for the foreseeable future. To allow users to experience the full potential of SR for dictation-oriented applications, we must improve both the underlying recognition algorithms and the processes by which users interact with SR systems. Our initial study confirmed that users experience significant difficulty correcting recognition errors when they do occur—with difficulties navigating within documents accounting for almost one-third of the users time. By documenting the causes and consequences of navigational difficulties, we were able to identify specific changes that eliminated some failures and changed the consequences of others. We were also able to revise the navigation commands to provide greater power while reducing the complexity of the commands. These changes have been evaluated and results will be reported shortly.

Future studies will continue to focus on the processes by which users interact with SR systems to provide additional insights into the difficulties users experience, where these difficulties occur, why they occur, and the consequences users experience.


Please contact:
Andrew Sears — University of Maryland, Baltimore County, USA
Tel: +1 410 455 3883