September 2nd, 2010 1 Comment

by Dominique Boucher

A wishlist for VoiceXML 3.0

Over our many years working with VoiceXML 2.0/2.1, we at Nu Echo have found a number of annoyances in the specification that we would very much like to be addressed in the upcoming VoiceXML 3.0. These are not far fetched, difficult to implement things. But they would certainly let us implement more easily some frequent requirements from customers and yield much better VUIs in the end.

And fixing these issues is not in contradiction to the new direction that VoiceXML 3.0 is taking.

(I could have entitled this post “some complaints about VoiceXML 2.1″, but I decided to turn it into one with a more positive bias.)

So here is a first attempt at a VoiceXML wishlist:

  1. Flag indicating use of DTMF termchar. Many customers ask us to enforce the use of a termination character like ‘#’  at some point in the dialog (the PIN number, for example). If we simply use the built-in grammars and specify a term char property, it is not possible to know if the key has been pressed at all when we get the result. Of course, the application can use a custom DTMF grammar, but it will have to explicitly strip the term char from the returned DTMF sequence. And custom DTMF grammars sometimes require the use of a speech recognition engine to work so they must be provisioned even for DTMF-only applications (which is a non-sense). Note that this feature exists for the <record> element.
  2. DTMF nomatch or speech nomatch? When a nomatch event occurs in a form allowing either speech or DTMF input, it not possible to know whether it’s the result of a wrong DTMF sequence entered or some speech not matching one of the active grammars. Such information would lead to better reprompting. For instance, suppose you activate some grammars for universal commands, then hitting the wrong key would lead to a prompt like “Invalid command. Please say …” instead of the more generic (and speech-specific) “I didn’t understand. Please say …”. VoiceXML 2.0 specifies that in the case of a nomatch (and a noinput as well), the value of application.lastresult$ is set, but the values are platform-dependent.
  3. DTMF sequence entered on nomatch. When a DTMF nomatch event occurs, the application does not know what DTMF sequence was entered. Having such information would also lead to more precise reprompting. (Well, some platforms may already provide it through the application.lastresult$ object. But that’s highly platform-specific. See point above)
  4. Mark information on hangup. In VoiceXML 2.1, mark elements can be interspersed with the application prompts so the application can know during which segment the caller barged-in. When callers hangup, however, the application cannot know the last mark reached. This information would improve reporting, letting us know whether people actually listen messages before hanging up.
  5. Better support for RESTful services in data element. VoiceXML 2.1 only mandates support for GET and POST as the HTTP method in data elements. It should also support the other HTTP verbs as well (like PUT, DELETE) to enable the integration of VoiceXML applications to RESTful services. (Some platforms, like Voxeo Prophecy, already offer that kind of support.) Also, it would nice if data returned from web services be in JSON format instead of XML. (XML is so 2009, right? just kidding.) VoiceXML interpreters already embed an ECMAScript interpreter and it would be much more convenient to manipulate a JSON object than an XML object.
  6. Dynamic array of grammars. In VoiceXML 2.1, it is possible to build a sophisticated list of prompts using the foreach element. It would be handy to be able to do the same with grammars. Of course, one can generate a base grammar dynamically that references those grammars, but experience showed us that, with certain ASR engines, speech recognition performs differently on such grammars compared to parallel grammars. (This one would certainly be more difficult to specify, as grammars are not part of executable code, and they can appear at different scopes - document, form, field. So it’s a bit more far-fetched.) The main use case for such a feature is the writing of AJAX-like applications in VoiceXML.
  7. Barge-in modality. In some cases, it’d be nice to control the modality of barge-in, like allowing one to barge-in in DTMF, but not in speech. It would be as simple as specifying the barge-in as
    “voice”, “dtmf”, “voice dtmf”, or “none” (instead of only “true” or “false”).
  8. Flushing the prompt queue. Being able to explicitly flush the prompt queue would really be handy for prompts like “one moment please” <flush>.

That’s it for now. We have a couple more in store, but I wanted to keep the list short.

What do you think? Are there other small issues that you would like to be addressed by the upcoming VoiceXML 3.0?

I would like to thank my colleague Jean-Philippe Gariépy for bringing most of these issues to my attention. He’s the one who has to deal the most with the VoiceXML code that our internal dialog framework generates.

July 15th, 2010 1 Comment

by Dominique Boucher

An ABNF primer

Interestingly, a lot of hits on our NuGram web site come from people looking for the words ABNF tutorial on one of the major search engines. And although we provide great tools for working with ABNF grammars, we don’t provide any introductory text on the ABNF syntax. That’s a shame!

To remedy this situation, I just put on Slideshare a presentation extracted from our training material that covers the basic concepts of ABNF grammars.

Remember that ABNF is the native syntax for many speech recognition (ASR) engines. And if your ASR doesn’t support it, let NuGram IDE handle the conversion to XML for you!

June 17th, 2010 No Comments

by Dominique Boucher

Getting started with NuGram Server Dev Edition

Today we announced the availability of the free NuGram Server Developer Edition. With NuGram Server, deploying dynamic grammars is now as simple as writing JSP or PHP pages, but designing them and debugging them becomes so much easier! Let’s see how to use NuGram Server in practice in 4 easy steps.

(The steps below assume the use of Unix or Unix-like environment. On Windows, you can use Cygwin or Mingw. An upcoming post will show the same steps for Windows users not having such an environment already installed.)

What is it, exactly?

So what exactly is NuGram Server? It’s basically a set of Java servlets offering speech recognition grammar-related services. The servlets can be used standalone or deployed as part of another Java web application.

Step 1 — Download NuGram Server

Of course, the first step is to download NuGram Server and request a free license. We will ask you for your name and an email address to which we will send the information to download the license. All you have to do then is save the license to a file (typically nugram-lic.nlb in $HOME/nuecho).

Once NuGram Server is downloaded, unzip the archive in some temporary directory:

[~] cd ~/tmp
[tmp] unzip ~/Downloads/nugram-server.zip

This should create a directory nugram-server-2.2.0-sdk:

[tmp] ls
nugram-server-2.2.0-sdk
[tmp] cd nugram-server-2.2.0-sdk
[nugram-server-2.2.0-sdk] ls
bin  conf  lib  webapp

These directories provide a skeleton NuGram Server instance. The bin directory contains some scripts to start the server in standalone mode (using the Jetty application server), and the webapp/grammars is where the grammars are put.

Step 2 — Download the sample projects

A Git repository hosted on Github contains sample projects to experiment with NuGram Server. It currently provides a single project, a dynamic grammar for a bill payee list. (Note that the projects can be downloaded without having to use Git at all. Simply go to the Github repository page and click on Download Source, and select Zip. You can then skip the second line below.)

On my machine, I simply do:

[~] cd ~/git
[git] git clone http://github.com/nuecho/nugram-server-samples.git
[git] cd nugram-server-samples/projects/bill-payee-list
[bill-payee-list]

Step 3 — Setup NuGram Server

The next thing to do is copy the NuGram Server main directories in the project:

[bill-payee-list] cp -R ~/tmp/nugram-server-2.2.0-sdk/* .
[bill-payee-list] ls
bin  conf  lib  README.md  src  webapp

We must now configure the license in webapp/WEB-INF/web.xml. Search for the com.nuecho.application.grammarserver.license-directory context initialization parameter and change its value to the name of the directory containing your free license (in my case /home/dboucher/nuecho):

<context-param>
 <param-name>com.nuecho.application.grammarserver.license-directory</param-name>
 <param-value>/home/dboucher/nuecho</param-value>
</context-param>

Finally, we must configure the context initializer for the dynamic grammar webapp/grammars/billpayees.abnf. (The context initializer is the piece of Java code that extracts the HTTP parameters and creates the global variables that will be available to the grammar template. More on this in an upcoming post.) We thus locate the initialization parameter com.nuecho.application.grammarserver.context-initializers for the /grammars servlet and replace it with:

<init-param>
  <param-name>com.nuecho.application.grammarserver.context-initializers</param-name>
  <param-value>
   billpayees.abnf=com.nuecho.samples.grammars.BillPayeeList
  </param-value>
</init-param>

Step 4 — Test your setup

To test that everything works fine, you just need to start the server in standalone mode:

[bill-payee-list] sh bin/server.sh
2010-06-16 13:51:37.735::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2010-06-16 13:51:37.823::WARN:  Deprecated configuration used for ...
2010-06-16 13:51:37.937::INFO:  jetty-6.1.3
2010-06-16 13:51:38.397::INFO:  NO JSP Support for /webapp, ...
[NuGram Server] ----------------------------------------------
[NuGram Server] NuGram Server v2.2.0
[NuGram Server] ----------------------------------------------
2010-06-16 13:51:39.573::INFO:  NO JSP Support for /lib, ...
2010-06-16 13:51:39.704::INFO:  NO JSP Support for /conf, ...
2010-06-16 13:51:39.826::INFO:  NO JSP Support for /bin, ...
2010-06-16 13:51:39.861::INFO:  Started SocketConnector @ 0.0.0.0:8765

You then use a program like Curl or Wget to instantiate the dynamic grammar template using URLs like:

Can that be simpler?

What next?

You are now ready to experiment with your own dynamic grammars. If you’ve not already done so, download NuGram IDE to get a complete development environment with which you will be able to design and test your grammars without even having to start NuGram Server. You can even test your Java context initializers directly within it.

You can also consult the NuGram dynamic grammar language reference on Slideshare, as well as the reference manual.

My upcoming posts will explain in greater details how to develop Java context initializers, NuGram IDE’s support for them, and how to make efficient use of the caching features of NuGram Server. Stay tuned!

And please, share your dynamic grammars experience with us!

June 16th, 2010 No Comments

by Dominique Boucher

NuGram IDE 2.2 available now!

Together with the introduction of NuGram Server Free Developer Edition, the Nu Echo team is pleased to announce that it also releases NuGram IDE 2.2. With this new release, designing and testing dynamic grammars has never been easier.

The most important feature introduced in this release is the support for Java to populate dynamic grammars. When using NuGram IDE, you use the exact same code to test and tune your grammar that will run in production, but without the long deployment cycle associated with stopping, deploying and restarting a Java web application. And deploying your grammars in NuGram Server is as simple as deploying JSP pages.

The free Basic Edition can be downloaded directly from within your Eclipse environment. Simply follow the download instructions. Or contact us for the Professional Edition.

The Nu Echo team is pleased to announce the immediate availability of NuGram Server Free Developer Edition, which will finally enable developers to download a completely free version of NuGram Server and immediately take advantage of its complete set of advanced capabilities.

For over a year, hundreds of speech application developers worldwide have taken advantage of NuGram IDE’s powerful features in order to develop better grammars faster. In particular:

  • The grammar editor’s advanced features (syntax coloring, on-the-fly validation, content-assist, sophisticated refactoring tools, etc.) greatly accelerate development and increase quality by detecting a wide range of grammar errors and problems on-the-fly.
  • Its integrated suite of analysis, testing, and debugging tools make it easy to find problems early – and fix them.
  • Its coverage tool helps insuring grammar integrity and making sure that no problem is ever accidentally introduced during development or maintenance.
  • The use of a single development environment regardless of the target speech engine minimizes the learning curve and enhances portability.

One of the revolutionary features of the NuGram Platform is the ability to develop dynamic grammars just as easily as static grammars, using the same powerful environment and set of tools, and to deploy them as simply as JSP pages. This means that there is no longer any need for the traditionally complex, error prone, and difficult to test approaches for developing dynamic grammars. Until now, however, developers could not easily experiment with the dynamic grammar features of the NuGram Platform since, in order to do so, they were required to purchase a license of NuGram Server. With the introduction of a Free Developer Edition, this is no longer the case.

Download NuGram Server Developer Edition now!

And make sure to check our repository of sample dynamic grammars on Github.