Monthly Archives: September 2010

Putting a VoiceXML gateway behind Asterisk

I’m a big fan of both Asterisk and VoiceXML. Each has its own sweet spot. Asterisk is great for building complete telephony systems (dial plans, conference calls, queues, voicemail, etc.), while VoiceXML is the standard way to develop full-blown telephony applications for large organisations.

But what if you want to bridge the two? There are situations where that would make sense. Consider a company using Asterisk as their front PBX. Now if they want to add a speech-enabled auto-attendant or some other self-service application, they could use a VoiceXML platform to run it instead of coding it in the Asterisk dialplan language. Of course, one could do the same using the Asterisk Gateway Interface (AGI) protocol, but he would be limited to the capabilities of the Asterisk dialplan language. (For instance, the generic speech recognizer API only returns the matched text of each NBest, not the semantic interpretation. This can be ok for some trivial applications, but that’s clearly inadequate for serious speech application development.)

The other day, I decided to test this idea and try using a VoiceXML gateway (Voxeo Prophecy in this case) from behind Asterisk. Here is how I made things work.

Machine setup

My setup consists of a laptop running Ubuntu 9.04 with Asterisk 1.4.21. Since Prophecy is only supported on CentOS and RedHat Enterprise Edition, I decided to run Prophecy on CentOS 5.5 inside a VMware virtual machine. The guest machine is configured to use a dedicated network between the guest and the host (the Host-only network configuration):

VMware guest network configuration

VMware guest network configuration

Asterisk configuration

On the Ubuntu (host) machine, in /etc/asterisk/sip.conf, I added the following entry:

[prophecy]
type=friend
username=prophecy
host=dynamic
canreinvite=yes
insecure=port,invite
qualify=yes
context=proph
auth=prophecy:none@asterisk

In /etc/asterisk/extensions.conf, I created a context proph with a dialplan that redirects all incoming calls to Prophecy:

[proph]
exten => _[A-Za-z].,1,Dial(SIP/prophecy/${EXTEN})
exten => _[A-Za-z].,n,Hangup

Configuring Prophecy

On the guest CentOS machine, in /opt/voxeo/prophecy/config/config.xml, I added the following lines in the VoIPCT category:

<category name="VoIPCT">
 ...
  <category name="Registrations">
    <category name="asterisk">
      <item name="Username">prophecy</item>
      <item name="AuthUsername">prophecy</item>
      <item name="Password">none</item>
      <item name="Domain">192.168.151.1</item>
      <item name="ContactIP">192.168.151.128:5060</item>
      <item name="ExpirationTimeout" type="int">3600</item>
      <item name="Registrar">192.168.151.1</item>
      <item name="ResolveRegistrar" type="int">0</item>
    </category>
  </category>
 ...
</category>

Here, the IP address 192.168.151.128 is the address assigned automatically by VMware to the guest, while 192.168.151.1 is the address of the host.

To call an application, I use SFLphone, an open-source softphone. One particularly appealing feature of this phone is its support for both the SIP and the IAX protocols. It is thus well suited for use with Asterisk.

Voilà! I am now able to make calls to VoiceXML applications from the comfort of my Ubuntu machine using only free/open-source solutions.

Get two NuGram IDE Pro licenses free when you purchase a grammar development course

Learn how to systematically deliver high-quality, high performance grammars by fully leveraging the features and tools available in NuGram IDE. Supported by hands-on exercises and numerous examples, Effective Grammar Development with NuGram IDE provides a breadth of knowledge, best practices, and tips and tricks that have shown their effectiveness at addressing the main challenges of grammar development and at delivering better grammars faster.

And if you order our on-site grammar development course before October 31st, you will get two licenses of NuGram IDE Professional Edition entirely free! There is only one catch: course must be given before December 31st, 2010. Contact us for details.

Testing an Intervoice InVision app with Voxeo Prophecy

I’ve just started working on a DTMF-only VoiceXML application for one of our customers. The application is developed using Intervoice InVsion Studio 3.1 (the native Windows version) and will be deployed on the Intervoice Voice Portal 5. The challenge in this project is three-fold:

  • Development is done in Nu Echo’s premises.
  • Nu Echo does not have IVP5 in its lab.
  • The only way to test the application is to connect to the customer’s network using VPN/pcAnywhere, deploy the application there and test using a local phone number.

Fortunately, except for all the VoiceXML code that handles attached data and transfers to the PBX, everything else can be easily tested on my own machine using only freely available tools.

The VoiceXML platform

InVision Studio is a tool that provides a graphical editor that maps an IVR call-flow to completely static, standards-compliant VoiceXML code (at least it’s the cased for the application I have to develop). Once the application successfully passes the validation tests, it can be exported to VoiceXML code that can then be deployed on any web server.

InVision Studio

InVision Studio

Since the resulting code does not depend on any proprietary extension, I decided to use Voxeo Prophecy to test it. It comes with a really decent ASR engine as well as a good TTS engine, both only for US English. The application is DTMF-only, so the ASR is not needed in my case, but TTS is handy when you don’t want to record all the application prompts (with InVision Studio, you have to specify a text to all the prompts you define).

After installing Prophecy, I had to use Prophecy Commander, the web-based management console, to configure the application and the route to reach the application. The route is used to associate a number to call with the application. In my case, the app is CustomerApp and the route is test-customer-app:

Routing rules in Prophecy Commander

Routing rules in Prophecy Commander

To call the application, I simply use the SIP phone that comes with Prophecy and dial test-customer-app.

Prophecy SIP phone

Prophecy SIP phone

The Web server

For the web server, I use Yaws. It’s a web server written in Erlang. But it could have been Apache, or Tomcat, Jetty, IIS, or any other web server. I chose Yaws mainly because I do some Erlang programming on my spare time and happen to know Yaws a bit more than the alternatives.

I configured Yaws to server static files on port 8080 from the Runtime directory of my InVision project. So whenever I export the VoiceXML code for the project, I just take the SIP phone and make a call to test the application. The Yaws configuration for the virtual server is:

<server localhost>
        port = 8080
        listen = 0.0.0.0
        docroot = "C:/InvisionProjects/CustomerApp/Runtime"
</server>

Extensive logging

First off, let me say that when it comes to debugging an app, the Prophecy logviewer is of tremendous help. I was first a bit overwhelmed by the vast quantity of information logged by the various parts that compose Prophecy, but the filtering capabilities make it easy to focus on only a fraction of it. (I have seen the logs of many VoiceXML platforms, and these ones are certainly among the most comprehensible.)

I’m writing this because I had to use the logviewer at the minute I started testing the application interactively. Why don’t I just listen to the prompts? Well, the problem is that the prompt texts are in French, while the TTS is in English. That’s plainly and simply incomprehensible and trying to figure out where I am in the application is really painful and annoying. So I decided to add VoiceXML log elements extensively in the application, all starting with a very specific pattern: [CustomerApp].

Logging elements in application

Logging elements in application

It is then very easy to filter the logs based on this pattern and see only the progress of the application:

Prophecy Logviewer

Prophecy Logviewer

A final remark

Yes, I could use the debugger that comes with InVision Studio. But frankly, I do not find it very intuitive to use. I prefer making calls and test the user experience at once.

A wishlist for VoiceXML 3.0

Over our many years working with VoiceXML 2.0/2.1, we at Nu Echo have found a number of annoyances in the specification that we would very much like to be addressed in the upcoming VoiceXML 3.0. These are not far fetched, difficult to implement things. But they would certainly let us implement more easily some frequent requirements from customers and yield much better VUIs in the end.

And fixing these issues is not in contradiction to the new direction that VoiceXML 3.0 is taking.

(I could have entitled this post “some complaints about VoiceXML 2.1″, but I decided to turn it into one with a more positive bias.)

So here is a first attempt at a VoiceXML wishlist:

  1. Flag indicating use of DTMF termchar. Many customers ask us to enforce the use of a termination character like ‘#’  at some point in the dialog (the PIN number, for example). If we simply use the built-in grammars and specify a term char property, it is not possible to know if the key has been pressed at all when we get the result. Of course, the application can use a custom DTMF grammar, but it will have to explicitly strip the term char from the returned DTMF sequence. And custom DTMF grammars sometimes require the use of a speech recognition engine to work so they must be provisioned even for DTMF-only applications (which is a non-sense). Note that this feature exists for the <record> element.
  2. DTMF nomatch or speech nomatch? When a nomatch event occurs in a form allowing either speech or DTMF input, it not possible to know whether it’s the result of a wrong DTMF sequence entered or some speech not matching one of the active grammars. Such information would lead to better reprompting. For instance, suppose you activate some grammars for universal commands, then hitting the wrong key would lead to a prompt like “Invalid command. Please say …” instead of the more generic (and speech-specific) “I didn’t understand. Please say …”. VoiceXML 2.0 specifies that in the case of a nomatch (and a noinput as well), the value of application.lastresult$ is set, but the values are platform-dependent.
  3. DTMF sequence entered on nomatch. When a DTMF nomatch event occurs, the application does not know what DTMF sequence was entered. Having such information would also lead to more precise reprompting. (Well, some platforms may already provide it through the application.lastresult$ object. But that’s highly platform-specific. See point above)
  4. Mark information on hangup. In VoiceXML 2.1, mark elements can be interspersed with the application prompts so the application can know during which segment the caller barged-in. When callers hangup, however, the application cannot know the last mark reached. This information would improve reporting, letting us know whether people actually listen messages before hanging up.
  5. Better support for RESTful services in data element. VoiceXML 2.1 only mandates support for GET and POST as the HTTP method in data elements. It should also support the other HTTP verbs as well (like PUT, DELETE) to enable the integration of VoiceXML applications to RESTful services. (Some platforms, like Voxeo Prophecy, already offer that kind of support.) Also, it would nice if data returned from web services be in JSON format instead of XML. (XML is so 2009, right? just kidding.) VoiceXML interpreters already embed an ECMAScript interpreter and it would be much more convenient to manipulate a JSON object than an XML object.
  6. Dynamic array of grammars. In VoiceXML 2.1, it is possible to build a sophisticated list of prompts using the foreach element. It would be handy to be able to do the same with grammars. Of course, one can generate a base grammar dynamically that references those grammars, but experience showed us that, with certain ASR engines, speech recognition performs differently on such grammars compared to parallel grammars. (This one would certainly be more difficult to specify, as grammars are not part of executable code, and they can appear at different scopes – document, form, field. So it’s a bit more far-fetched.) The main use case for such a feature is the writing of AJAX-like applications in VoiceXML.
  7. Barge-in modality. In some cases, it’d be nice to control the modality of barge-in, like allowing one to barge-in in DTMF, but not in speech. It would be as simple as specifying the barge-in as
    “voice”, “dtmf”, “voice dtmf”, or “none” (instead of only “true” or “false”).
  8. Flushing the prompt queue. Being able to explicitly flush the prompt queue would really be handy for prompts like “one moment please” <flush>.

That’s it for now. We have a couple more in store, but I wanted to keep the list short.

What do you think? Are there other small issues that you would like to be addressed by the upcoming VoiceXML 3.0?

I would like to thank my colleague Jean-Philippe Gariépy for bringing most of these issues to my attention. He’s the one who has to deal the most with the VoiceXML code that our internal dialog framework generates.