Followers

Tuesday, November 10, 2009

Building VoiceXML Applications

Now that you have a solid understanding of the VoiceXML architecture, it is time to look more closely at the VoiceXML language itself. If you have experience with other XML-based markup languages, such as WML or XHTML, or even other tag-based languages, such as HTML or HDML, creating VoiceXML applications will not pose much difficulty for you. The main adjustment will be to the development of the user interface. Rather than sending content to a screen, it is spoken over a telephone. For this reason, it is very important to create clean, intuitive applications so users do not become frustrated and hang up. VoiceXML includes elements to help with this, as we will discuss later in this section.

Language Concepts
Before examining some VoiceXML code, let's take a quick look at the basic concepts behind a VoiceXML application.

Session
Once a user connects to the VoiceXML gateway, a session is started. This session is maintained as new VoiceXML documents are loaded and unloaded. The session ends only when requested by the user, the VoiceXML document, or the voice gateway. Each platform will have default session characteristics, many of which can be controlled by VoiceXML logic.

Dialogs
VoiceXML applications are constructed of one or many dialogs. Each dialog represents some form of conversational state with the user. After completing one dialog, you move on to another dialog, until the application is complete. There are two types of dialogs: forms and menus. A form collects user input, and a menu gives the user options to choose from. If at any point no dialog is specified, the VoiceXML application terminates automatically.

VoiceXML also has subdialogs. These are very similar to function calls, allowing the application to call out to a new dialog, then return to the original form. All of the variables, grammar, and state information is available upon returning to the calling document. Subdialogs can be used to create a set of components that may be used from several applications.

Applications
A VoiceXML application is a set of documents that share the same root document. This root document is automatically loaded any time a user interacts with any document in the application, and remains loaded until the user transitions to a document outside of the application. While it is loaded, the root document's variables are available to the other documents. It is also possible to specify grammars to be active for the duration of the application.

Grammars
Grammars make it possible to specify valid inputs from the user. Each dialog will contain at least one speech and/or DTMF grammar. In simple applications, only the dialog's grammars are active for that dialog. In the more complex, mixed-initiative applications, it is possible to have active grammars outside of the dialog being executed. Mixed-initiative refers to applications in which both the user and the gateway determine what will happen next.

Grammar creation is a very important aspect of designing intuitive, robust VoiceXML applications. It is essential to create grammars that accurately reflect the typical speech inputs from a user. If this is not achieved, either the prompts or the grammar will have to be changed. In some cases, the built-in grammars may be sufficient. VoiceXML 2.0 has the following built-in grammars: boolean, date, digits, currency, number, and time. In addition, many of the voice gateway vendors have proprietary grammars that you can use.

Events
When the normal execution of an application is interrupted, an event is thrown. In most cases, events are used when the user fails to respond to a prompt, or when the response is not suitable. They are also used when a user requests help or wants to exit the application. When an event is triggered, the element allows you to specify what the reaction should be. If there is no handler at the dialog level, the event can be caught at a higher level, since events follow an inheritance model.

Links
Links enable you to create mixed-initiative applications. They specify a grammar that is active when the user is within scope of the link. When the input matches the grammar, the user is redirected to the specified destination URI.

Scripting
If you require additional control over an application, which is not provided by standard VoiceXML elements, you can use scripting in the form of ECMAScript. This allows you to do such things as collect values of several fields in a single response.