Monthly Archives: June 2010

Getting started with NuGram Server Dev Edition

Today we announced the availability of the free NuGram Server Developer Edition. With NuGram Server, deploying dynamic grammars is now as simple as writing JSP or PHP pages, but designing them and debugging them becomes so much easier! Let’s see how to use NuGram Server in practice in 4 easy steps.

(The steps below assume the use of Unix or Unix-like environment. On Windows, you can use Cygwin or Mingw. An upcoming post will show the same steps for Windows users not having such an environment already installed.)

What is it, exactly?

So what exactly is NuGram Server? It’s basically a set of Java servlets offering speech recognition grammar-related services. The servlets can be used standalone or deployed as part of another Java web application.

Step 1 — Download NuGram Server

Of course, the first step is to download NuGram Server and request a free license. We will ask you for your name and an email address to which we will send the information to download the license. All you have to do then is save the license to a file (typically nugram-lic.nlb in $HOME/nuecho).

Once NuGram Server is downloaded, unzip the archive in some temporary directory:

[~] cd ~/tmp
[tmp] unzip ~/Downloads/nugram-server.zip

This should create a directory nugram-server-2.2.0-sdk:

[tmp] ls
nugram-server-2.2.0-sdk
[tmp] cd nugram-server-2.2.0-sdk
[nugram-server-2.2.0-sdk] ls
bin  conf  lib  webapp

These directories provide a skeleton NuGram Server instance. The bin directory contains some scripts to start the server in standalone mode (using the Jetty application server), and the webapp/grammars is where the grammars are put.

Step 2 — Download the sample projects

A Git repository hosted on Github contains sample projects to experiment with NuGram Server. It currently provides a single project, a dynamic grammar for a bill payee list. (Note that the projects can be downloaded without having to use Git at all. Simply go to the Github repository page and click on Download Source, and select Zip. You can then skip the second line below.)

On my machine, I simply do:

[~] cd ~/git
[git] git clone http://github.com/nuecho/nugram-server-samples.git
[git] cd nugram-server-samples/projects/bill-payee-list
[bill-payee-list]

Step 3 — Setup NuGram Server

The next thing to do is copy the NuGram Server main directories in the project:

[bill-payee-list] cp -R ~/tmp/nugram-server-2.2.0-sdk/* .
[bill-payee-list] ls
bin  conf  lib  README.md  src  webapp

We must now configure the license in webapp/WEB-INF/web.xml. Search for the com.nuecho.application.grammarserver.license-directory context initialization parameter and change its value to the name of the directory containing your free license (in my case /home/dboucher/nuecho):

<context-param>
 <param-name>com.nuecho.application.grammarserver.license-directory</param-name>
 <param-value>/home/dboucher/nuecho</param-value>
</context-param>

Finally, we must configure the context initializer for the dynamic grammar webapp/grammars/billpayees.abnf. (The context initializer is the piece of Java code that extracts the HTTP parameters and creates the global variables that will be available to the grammar template. More on this in an upcoming post.) We thus locate the initialization parameter com.nuecho.application.grammarserver.context-initializers for the /grammars servlet and replace it with:

<init-param>
  <param-name>com.nuecho.application.grammarserver.context-initializers</param-name>
  <param-value>
   billpayees.abnf=com.nuecho.samples.grammars.BillPayeeList
  </param-value>
</init-param>

Step 4 — Test your setup

To test that everything works fine, you just need to start the server in standalone mode:

[bill-payee-list] sh bin/server.sh
2010-06-16 13:51:37.735::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2010-06-16 13:51:37.823::WARN:  Deprecated configuration used for ...
2010-06-16 13:51:37.937::INFO:  jetty-6.1.3
2010-06-16 13:51:38.397::INFO:  NO JSP Support for /webapp, ...
[NuGram Server] ----------------------------------------------
[NuGram Server] NuGram Server v2.2.0
[NuGram Server] ----------------------------------------------
2010-06-16 13:51:39.573::INFO:  NO JSP Support for /lib, ...
2010-06-16 13:51:39.704::INFO:  NO JSP Support for /conf, ...
2010-06-16 13:51:39.826::INFO:  NO JSP Support for /bin, ...
2010-06-16 13:51:39.861::INFO:  Started SocketConnector @ 0.0.0.0:8765

You then use a program like Curl or Wget to instantiate the dynamic grammar template using URLs like:

Can that be simpler?

What next?

You are now ready to experiment with your own dynamic grammars. If you’ve not already done so, download NuGram IDE to get a complete development environment with which you will be able to design and test your grammars without even having to start NuGram Server. You can even test your Java context initializers directly within it.

You can also consult the NuGram dynamic grammar language reference on Slideshare, as well as the reference manual.

My upcoming posts will explain in greater details how to develop Java context initializers, NuGram IDE’s support for them, and how to make efficient use of the caching features of NuGram Server. Stay tuned!

And please, share your dynamic grammars experience with us!

NuGram IDE 2.2 available now!

Together with the introduction of NuGram Server Free Developer Edition, the Nu Echo team is pleased to announce that it also releases NuGram IDE 2.2. With this new release, designing and testing dynamic grammars has never been easier.

The most important feature introduced in this release is the support for Java to populate dynamic grammars. When using NuGram IDE, you use the exact same code to test and tune your grammar that will run in production, but without the long deployment cycle associated with stopping, deploying and restarting a Java web application. And deploying your grammars in NuGram Server is as simple as deploying JSP pages.

The free Basic Edition can be downloaded directly from within your Eclipse environment. Simply follow the download instructions. Or contact us for the Professional Edition.

Introducing the NuGram Server Free Developer Edition

The Nu Echo team is pleased to announce the immediate availability of NuGram Server Free Developer Edition, which will finally enable developers to download a completely free version of NuGram Server and immediately take advantage of its complete set of advanced capabilities.

For over a year, hundreds of speech application developers worldwide have taken advantage of NuGram IDE’s powerful features in order to develop better grammars faster. In particular:

  • The grammar editor’s advanced features (syntax coloring, on-the-fly validation, content-assist, sophisticated refactoring tools, etc.) greatly accelerate development and increase quality by detecting a wide range of grammar errors and problems on-the-fly.
  • Its integrated suite of analysis, testing, and debugging tools make it easy to find problems early – and fix them.
  • Its coverage tool helps insuring grammar integrity and making sure that no problem is ever accidentally introduced during development or maintenance.
  • The use of a single development environment regardless of the target speech engine minimizes the learning curve and enhances portability.

One of the revolutionary features of the NuGram Platform is the ability to develop dynamic grammars just as easily as static grammars, using the same powerful environment and set of tools, and to deploy them as simply as JSP pages. This means that there is no longer any need for the traditionally complex, error prone, and difficult to test approaches for developing dynamic grammars. Until now, however, developers could not easily experiment with the dynamic grammar features of the NuGram Platform since, in order to do so, they were required to purchase a license of NuGram Server. With the introduction of a Free Developer Edition, this is no longer the case.

Download NuGram Server Developer Edition now!

And make sure to check our repository of sample dynamic grammars on Github.

How a great speech application may appear to perform poorly

One of our products, a Canadian address capture VoiceXML module, has been deployed with great success by several of our customers. One of these deployments was done in the context of a change of address application, where the module has to capture the new address, the date when the new address becomes effective, and the new telephone number. Note that all information is entirely obtained through speech recognition.

In this deployment, the contract specified that the application had to achieve a minimum success rate. In order to track performance, two success metrics were jointly defined with the customer:

  • The Raw Success Rate. This is calculated simply by dividing the number of calls for which the change of address was successfully completed (with all collected information confirmed by the caller), divided by the total number of calls for which the change of address module was used.
  • The Real Success Rate. This is calculated similarly, with the exception that certain calls were excluded from consideration, namely calls where the caller provided no input whatsoever and calls where the caller hung up within the first two interactions.

The customer specified that the application had to achieve a Real Success Rate of 75% or more. The rationale for the Real Success Rate is to exclude callers that either don’t want to use the application (for instance because they ended up in the application by mistake) or don’t have the requested information. As a matter of fact, after the initial deployment revealed a fairly high hang-up rate early in the change of address call flow, the customer contacted a number of those callers in order to find out why they had decided to hang up and it turns out that most of them admitted that they had no intention of changing their address; they had simply selected this option in the hope of getting connected to an agent faster.

It’s nonetheless interesting to track both metrics since a large difference between them can indicate problems that occurred earlier in the call (that is, before going into the change of address application).

For instance, at the end of 2008, the customer made some changes in the front menus, which significantly increased the number of callers that incorrectly found themselves in the change of address application. As shown in the graph below, this created a big drop in the Raw Success Rate while the Real Success Rate remained relatively constant. The customer implemented various changes to the front menu throughout 2009 (while the change of address application remained unchanged), with the result that the Raw Success Rate was finally stabilized at around 75% (and the Real Success Rate at 85%).

This shows that, when trying to evaluate the performance of an application, it’s important to focus on the correct metrics. Otherwise, we may end up not only with an incorrect assessment of its real performance, but also with wild variations that have nothing to do with the application itself.

Grammar tips & tricks #2 – return key/value pairs whenever possible

Tip #2: In SISR semantic tags, return key/value pairs whenever possible.

Strings all over the place

It is fairly common for new SRGS grammar writers to write SISR semantic tags that only return string values to calling rules or to the voice application, even when the data has some structure. For example, a dollar amount rule could return a string like this (in ABNF):

public $amount =
  $dollars {out = rules.dollars + ".00";}
  [and $cents {out = out.substring(0, out.length - 3)
                        + "." + rules.cents; }]
;

...

One obvious disadvantage of this approach is that the application has to extract the dollars and the cents from the returned string. Of course, a simple string to number conversion can be done. But due to possible rounding errors, it is best to extract both values separately and converting the two substrings to integers. This may not be that bad, machines are so fast these days.

A less obvious reason why this is not recommended relates to the fact that the computations made by the semantic tags can only begin once the engine has finished recognizing the utterance. In other words, the corresponding computation time directly adds to the application’s response time. The ECMAScript interpreter typically compiles the script (the semantic tag) to an intermediate representation before executing it. Unless the ASR properly caches the result of this compilation process, the script is compiled again and again. The more complicated the script is, the more processing power it takes to parse it, compile it, and execute it.

We also have to add to that the fact that string concatenation/substring extraction creates a lot of unnecessary temporary objects, thus putting a bigger burden on the garbage collector (or any other memory management algorithm employed by the ECMAScript interpreter).

Finally, since semantic tags are compiled and executed for every hypothesis in the N-best list, the computation time and the number of objects created grows proportionately with the number of hypotheses requested by the application. If we sum all this, we end up with a grammar that requires unnecessary processing power from the ASR engine, which can cause significant delays in the recognition process. This may even result in noticeable latency at the application level (i.e. some dead-air).

Use semantic keys instead

A better way to write the above grammar would be:

public $amount =
  $number {out.dollars = rules.number;
           out.cents   = 0; }
  [and $cents {out.cents = rules.cents; }]
;

...

Using explicit semantic keys has many advantages:

  • Documentation. This self-documents the type/purpose of the returned values.
  • Maintenance/evolution. The scripts are much simpler, thus easier to understand for someone trying to understand the grammar. It is also easier to add other keys later if need be.
  • Analytics. The presence of distinct semantic keys facilitates the analysis of field data. For example, we can be interested in performing a recognition performance test for only a subset of our collected utterances, i.e. those utterances whose value for the cents semantic key is 0.
Related posts: