News from W3C
W3C's Speech Interface Framework
by Dave Raggett
W3C is working to expand access to the Web to allow people to interact via key pads, spoken commands, listening to prerecorded speech, synthetic speech and music. This will allow any of the world's 2 billion telephones to be used to access appropriately designed Web-based services, and will be a benefit to people with visual impairments or needing Web access while keeping their hands and eyes free for other things. It will also allow effective interaction with display-based Web content in the cases where the mouse and keyboard may be missing or inconvenient.
To fulfil this goal, the W3C Voice Browser Working Group is defining a suite of markup languages covering dialog, speech synthesis, speech recognition, call control and other aspects of interactive voice response applications. Specifications such as the Speech Synthesis Markup Language (http://www.w3.org/TR/speech-synthesis/), Speech Recognition Grammar Specification (see http://www.w3.org/TR/speech-grammar/), and Call Control XML (see http://www.w3.org/TR/ccxml/) are core technologies for describing speech synthesis, recognition grammars, and call control constructs respectively. VoiceXML is a dialog markup language that leverages the other specifications for creating dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key (touch tone) input, recording of spoken input, telephony, and mixed initiative conversations.
These specifications bring the advantages of web-based development and content delivery to interactive voice response applications. Further work is anticipated on enabling their use with other W3C markup languages such as XHTML, XForms (a specification of Web forms that can be used with a wide variety of platforms, see http://www.w3.org/MarkUp/Forms/) and the Synchronized Multimedia Integration Language SMILTM (see http://www.w3.org/AudioVideo/). This will be done in conjunction with other W3C Working Groups, including the Multimodal Interaction Activity.
Some possible applications include:
- accessing business information, including the corporate 'front desk' asking callers who or what they want, automated telephone ordering services, support desks, order tracking, airline arrival and departure information, cinema and theater booking services, and home banking services
- accessing public information, including community information such as weather, traffic conditions, school closures, directions and events; local, national and international news; national and international stock market information; and business and e-commerce transactions
- accessing personal information, including calendars, address and telephone lists, to-do lists, shopping lists, and calorie counters
- assisting the user to communicate with other people via sending and receiving voice-mail and email messages.
VoiceXML 2.0 is designed based upon extensive industry experience for creating audio dialogs. For an introduction, a tutorial is available at http://www.w3.org/Voice/Guide/. Further tutorials and other resources can be found on the VoiceXML Forum Web site (http://www.voicexml.org/). W3C and VoiceXML Forum have signed a memorandum of understanding setting out mutual goals.
Based upon a small set of widely implemented extensions to VoiceXML 2.0, we anticipate an interim version of the dialog markup language called VoiceXML 2.1. These features will help developers build even more powerful, maintainable and portable voice-activated services, with complete backwards compatibility with the VoiceXML 2.0 specification. We expect to publish VoiceXML 2.1 as a small specification that describes the extensions to 2.0. The first Working Draft is expected to be published in September 2003. Future work on dialog markup, component of W3C's Speech Interface Framework, is described below.
The Speech Recognition Grammar specification (SRGS) covers both speech and DTMF (touch tone) input. DTMF is valuable in noisy conditions or when the social context makes it awkward to speak. Grammars can be specified in either an XML or an equivalent augmented BNF (ABNF) syntax, which some authors may find easier to deal with. Speech recognition is an inherently uncertain process. Some speech engines may be able to ignore "um's" and "aah's", and to perform partial matches. Recognizers may report confidence values. If the utterance has several possible parses, the recognizer may be able to report the most likely alternatives (n-best results).
The Speech Synthesis specification (SSML) defines a markup language for prompting users via a combination of prerecorded speech, synthetic speech and music. You can select voice characteristics (name, gender and age) and the speed, volume, pitch, and emphasis. There is also provision for overriding the synthesis engine's default pronunciation.
The Voice Browser Working Group is collaborating with the Cascading Style Sheets (CSS) Working Group to develop a CSS3 module for speech synthesis based upon SSML for use in rendering XML documents to speech. This is intended to replace the aural cascading style sheet properties in CSS2. The first Working Draft was published in May 2003.
Semantic Interpretation for Speech Recognition specification (see http://www.w3.org/TR/semantic-interpretation/) describes annotations to grammar rules for extracting the semantic results from recognition. The annotations are expressed in a syntax based upon a subset of ECMAScript, and when evaluated, yield a result represented either as XML or as a value that can be held in an ECMAScript variable. The target for the XML output is the Extensible Multimodal Annotation Markup Language (EMMA) which is being developed in the Multimodal Interaction Activity (see http://www.w3.org/TR/emma/).
W3C is working on a markup language called CCXML to enable fine-grained control of speech (signal processing) resources and telephony resources in a VoiceXML telephony platform (http://www.w3.org/TR/ccxml/). The scope of these language features is for controlling resources in a platform on the network edge, not for building network-based call processing applications in a telephone switching system, or for controlling an entire telecom network. These components are designed to integrate naturally with existing language elements for defining applications which run in a voice browser framework. This will enable application developers to use markup to perform call screening, whisper call waiting, call transfer, and more. Users can be offered the ability to place outbound calls, conditionally answer calls, and to initiate or receive outbound communications such as another call.
W3C's Speech Interface Framework work is ongoing and the W3C Voice Browser Working Group is seeking participants to develop and/or give feedback on public drafts and give suggestions for requirements and directions as well. A public mailing list is also available for public discussion at http://lists.w3.org/Archives/Public/www-multimodal/.
Home page for W3C's Voice Browser Activity: http://www.w3.org/Voice/
Published Technical Reports on Voice: http://www.w3.org/TR/tr-activity.html#VoiceBrowserActivity
Voice Browser FAQ: http://www.w3.org/Voice/#faq
VoiceXML tutorial: http://www.w3.org/Voice/Guide/
Emerging Ontology Standard OWL strengthens Semantic Web Foundations
W3C has announced the advancement of the OWL Web Ontology Language to Candidate Recommendation. OWL is a language for defining structured, Web-based ontologies which enable richer integration and interoperability of data across application boundaries. OWL is used to publish and share sets of terms called ontologies, providing advanced Web search, software agents and knowledge management. Early adopters of these standards include bioinformatics and medical communities, corporate enterprise and governments. OWL enables a range of descriptive applications including managing web portals, collections management, content-based searches, enabling intelligent agents, web services and ubiquitous computing.
OWL is a Web Ontology language. Where earlier languages have been used to develop tools and ontologies for specific user communities (particularly in the sciences and in company-specific e-commerce applications), they were not defined to be compatible with the architecture of the World Wide Web in general, and the Semantic Web in particular. OWL rectifies this by providing a language which uses the linking provided by RDF to add the following capabilities to ontologies:
- ability to be distributed across many systems
- scalable to Web needs
- compatible with Web standards for accessibility and internationalization
- open and extensible
The OWL Documents produced by W3C
OWL is part of the growing stack of W3C Recommendations related to the Semantic Web. It facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL has three increasingly-expressive sublanguages: OWL Lite, OWL DL, and OWL Full.
OWL is specified in 6 documents: each aimed at different segments of those wishing to learn, use, implement or understand the OWL language. They include:
- a presentation of the use cases and requirements that motivated OWL
- an overview document which briefly explains the features of OWL and how they can be used
- a comprehensive Guide that provides a walk-through of the features of OWL with many examples of the use of OWL features (in the area of describing food and wine).
- a reference document that provides the details of every OWL feature
- a test case document, and test suite, providing over a hundred tests that can be used for making sure that OWL implementations are consistent with the language design
- a document presenting the semantics of OWL and details of the mapping from OWL to RDF.
Web Ontology Working Group includes Industrial and Academic Leaders, seeks Implementations
The advancement of the OWL Web Ontology Language to Candidate Recommendation is an explicit call for implementations. A large number of organizations have been exploring the use of OWL, with many tools currently available. In addition, both the US government (via DARPA and NSF) and the European Union (via the 5th and 6th generation frameworks of the IST program) have invested in web ontology language development. Most of the systems currently using DAML, OIL and DAML+OIL (the predecessor languages that OWL was based on) are now migrating to OWL.
OWL Overview: http://www.w3.org/TR/2003/CR-owl-features-20030818/
OWL Guide: http://www.w3.org/TR/2003/CR-owl-guide-20030818/
OWL Reference: http://www.w3.org/TR/2003/CR-owl-ref-20030818/
OWL Test Cases: http://www.w3.org/TR/2003/CR-owl-test-20030818/
OWL Use Cases and Requirements: http://www.w3.org/TR/2003/CR-webont-req-20030818/
OWL FAQ: http://www.w3.org/2003/08/owlfaq/
Home page for W3C's Semantic Web Activity: http://www.w3.org/2001/sw/