Content for TR 22.977 Word version: 18.0.1

1… 2…

...

1 Scope p. 5

Speech Enabled Services

The advancement in the Automatic Speech Recognition (ASR) technology, coupled with the rapid growth in the wireless telephony market has created a compelling need for speech-enabled services. Voice-activated dialling has become a de facto standard in many of the mobile phones in the market today. The speech recognition technology has also been applied more recently to voice messaging and personal access services. A Voice Extensible Markup Language (Voice XML) has been designed to bring the full power of web development and content delivery to voice response applications [11]. Voice portals that provide voice access to conventional graphically oriented services over the Internet are now becoming popular. Forecasts show that speech-driven services will play an important role on the 3G market. Users of mobile terminals want the ability to access information while on the move and the small portable mobile devices that will be used to access this information need improved user interfaces using speech input.

Multimodal and Multi-device Services

Speech-enabled services may utilize speech alone for input and output interaction, or may also utilise multiple input and output modalities leading to the multimodal services.

Online access to information is fast becoming a must-have. Along with this trend, come new usage models for information access, particularly in mobile environments. Information appliances in cars such as navigation systems are standard in high-end cars already and this will penetrate lower-end vehicles soon. Data access using mobile phones, though limited and currently estimated to take three years to be widespread, has significant momentum that makes it certain to become widespread. In this new computing paradigm a person will expect to have access to information and interactions in a seamless manner in many environments, be it in the office, at home, in the car, often on several different devices. These new access methods have compelling advantages, such as mobile accessibility, low cost, ease of use, and mass market penetration. They also have their limitations - in particular, it is hard to enter and access data using small devices, speech recognition can introduce mistakes that can sometimes be repeating and therefore blocking the transaction; one interaction mode does not suit all circumstances, and so on.

For example, a recent study of task-performance using wireless phones, such as reading world headlines and checking local weather concluded that currently, these services are often poorly designed, have insufficient task analysis, and abuse existing non-mobile design guidelines. The full report from the field study can be downloaded at [6]. The basic conclusion of this study is that wireless access usability fails miserably; accomplishing even the simplest of tasks takes much too long to provide any user satisfaction. It is thus essential for the widespread acceptance of this computing paradigm to provide an efficient and usable interface on the different device platforms that people are expected to use to access and interact with information.

We can expect and already observe a trend towards a new frontier of interactive services: multimodal and multi-device services.

These services exploit the fact that different interaction modes are good at different things - for example, talking is easier than typing, but reading is faster than listening. Multi-modal interfaces combine the use of multiple interaction modes, such as voice, keypad and display to improve the user interface to services.

Different standard bodies are addressing aspects of this space, driven by several industry proposals: W3C (e.g. MMI activity)[11], OMA/WAP Forum, ETSI [1], IETF[14]…). In particular, the W3C MMI [13] aims at defining a programming model for multimodal and multi-device applications.

Additional details and motivations are discussed in [2, 7, 8].

Overview

A brief overview of the speech-enabled services is presented in Chapter 4. The different ways of enabling speech recognition for the speech enabled services are described in chapter 5. Section 6 discusses multimodal services and options to enable multimodal and multi-device services. The scope of the report, references, definitions and abbreviations are detailed in the first few chapters.