Archive for the ‘NuGram’ Category

July 15th, 2010 1 Comment

by Dominique Boucher

An ABNF primer

Interestingly, a lot of hits on our NuGram web site come from people looking for the words ABNF tutorial on one of the major search engines. And although we provide great tools for working with ABNF grammars, we don’t provide any introductory text on the ABNF syntax. That’s a shame!

To remedy this situation, I just put on Slideshare a presentation extracted from our training material that covers the basic concepts of ABNF grammars.

Remember that ABNF is the native syntax for many speech recognition (ASR) engines. And if your ASR doesn’t support it, let NuGram IDE handle the conversion to XML for you!

June 17th, 2010 No Comments

by Dominique Boucher

Getting started with NuGram Server Dev Edition

Today we announced the availability of the free NuGram Server Developer Edition. With NuGram Server, deploying dynamic grammars is now as simple as writing JSP or PHP pages, but designing them and debugging them becomes so much easier! Let’s see how to use NuGram Server in practice in 4 easy steps.

(The steps below assume the use of Unix or Unix-like environment. On Windows, you can use Cygwin or Mingw. An upcoming post will show the same steps for Windows users not having such an environment already installed.)

What is it, exactly?

So what exactly is NuGram Server? It’s basically a set of Java servlets offering speech recognition grammar-related services. The servlets can be used standalone or deployed as part of another Java web application.

Step 1 — Download NuGram Server

Of course, the first step is to download NuGram Server and request a free license. We will ask you for your name and an email address to which we will send the information to download the license. All you have to do then is save the license to a file (typically nugram-lic.nlb in $HOME/nuecho).

Once NuGram Server is downloaded, unzip the archive in some temporary directory:

[~] cd ~/tmp
[tmp] unzip ~/Downloads/nugram-server.zip

This should create a directory nugram-server-2.2.0-sdk:

[tmp] ls
nugram-server-2.2.0-sdk
[tmp] cd nugram-server-2.2.0-sdk
[nugram-server-2.2.0-sdk] ls
bin  conf  lib  webapp

These directories provide a skeleton NuGram Server instance. The bin directory contains some scripts to start the server in standalone mode (using the Jetty application server), and the webapp/grammars is where the grammars are put.

Step 2 — Download the sample projects

A Git repository hosted on Github contains sample projects to experiment with NuGram Server. It currently provides a single project, a dynamic grammar for a bill payee list. (Note that the projects can be downloaded without having to use Git at all. Simply go to the Github repository page and click on Download Source, and select Zip. You can then skip the second line below.)

On my machine, I simply do:

[~] cd ~/git
[git] git clone http://github.com/nuecho/nugram-server-samples.git
[git] cd nugram-server-samples/projects/bill-payee-list
[bill-payee-list]

Step 3 — Setup NuGram Server

The next thing to do is copy the NuGram Server main directories in the project:

[bill-payee-list] cp -R ~/tmp/nugram-server-2.2.0-sdk/* .
[bill-payee-list] ls
bin  conf  lib  README.md  src  webapp

We must now configure the license in webapp/WEB-INF/web.xml. Search for the com.nuecho.application.grammarserver.license-directory context initialization parameter and change its value to the name of the directory containing your free license (in my case /home/dboucher/nuecho):

<context-param>
 <param-name>com.nuecho.application.grammarserver.license-directory</param-name>
 <param-value>/home/dboucher/nuecho</param-value>
</context-param>

Finally, we must configure the context initializer for the dynamic grammar webapp/grammars/billpayees.abnf. (The context initializer is the piece of Java code that extracts the HTTP parameters and creates the global variables that will be available to the grammar template. More on this in an upcoming post.) We thus locate the initialization parameter com.nuecho.application.grammarserver.context-initializers for the /grammars servlet and replace it with:

<init-param>
  <param-name>com.nuecho.application.grammarserver.context-initializers</param-name>
  <param-value>
   billpayees.abnf=com.nuecho.samples.grammars.BillPayeeList
  </param-value>
</init-param>

Step 4 — Test your setup

To test that everything works fine, you just need to start the server in standalone mode:

[bill-payee-list] sh bin/server.sh
2010-06-16 13:51:37.735::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2010-06-16 13:51:37.823::WARN:  Deprecated configuration used for ...
2010-06-16 13:51:37.937::INFO:  jetty-6.1.3
2010-06-16 13:51:38.397::INFO:  NO JSP Support for /webapp, ...
[NuGram Server] ----------------------------------------------
[NuGram Server] NuGram Server v2.2.0
[NuGram Server] ----------------------------------------------
2010-06-16 13:51:39.573::INFO:  NO JSP Support for /lib, ...
2010-06-16 13:51:39.704::INFO:  NO JSP Support for /conf, ...
2010-06-16 13:51:39.826::INFO:  NO JSP Support for /bin, ...
2010-06-16 13:51:39.861::INFO:  Started SocketConnector @ 0.0.0.0:8765

You then use a program like Curl or Wget to instantiate the dynamic grammar template using URLs like:

Can that be simpler?

What next?

You are now ready to experiment with your own dynamic grammars. If you’ve not already done so, download NuGram IDE to get a complete development environment with which you will be able to design and test your grammars without even having to start NuGram Server. You can even test your Java context initializers directly within it.

You can also consult the NuGram dynamic grammar language reference on Slideshare, as well as the reference manual.

My upcoming posts will explain in greater details how to develop Java context initializers, NuGram IDE’s support for them, and how to make efficient use of the caching features of NuGram Server. Stay tuned!

And please, share your dynamic grammars experience with us!

The Nu Echo team is pleased to announce the immediate availability of NuGram Server Free Developer Edition, which will finally enable developers to download a completely free version of NuGram Server and immediately take advantage of its complete set of advanced capabilities.

For over a year, hundreds of speech application developers worldwide have taken advantage of NuGram IDE’s powerful features in order to develop better grammars faster. In particular:

  • The grammar editor’s advanced features (syntax coloring, on-the-fly validation, content-assist, sophisticated refactoring tools, etc.) greatly accelerate development and increase quality by detecting a wide range of grammar errors and problems on-the-fly.
  • Its integrated suite of analysis, testing, and debugging tools make it easy to find problems early – and fix them.
  • Its coverage tool helps insuring grammar integrity and making sure that no problem is ever accidentally introduced during development or maintenance.
  • The use of a single development environment regardless of the target speech engine minimizes the learning curve and enhances portability.

One of the revolutionary features of the NuGram Platform is the ability to develop dynamic grammars just as easily as static grammars, using the same powerful environment and set of tools, and to deploy them as simply as JSP pages. This means that there is no longer any need for the traditionally complex, error prone, and difficult to test approaches for developing dynamic grammars. Until now, however, developers could not easily experiment with the dynamic grammar features of the NuGram Platform since, in order to do so, they were required to purchase a license of NuGram Server. With the introduction of a Free Developer Edition, this is no longer the case.

Download NuGram Server Developer Edition now!

And make sure to check our repository of sample dynamic grammars on Github.

Tip #2: In SISR semantic tags, return key/value pairs whenever possible.

Strings all over the place

It is fairly common for new SRGS grammar writers to write SISR semantic tags that only return string values to calling rules or to the voice application, even when the data has some structure. For example, a dollar amount rule could return a string like this (in ABNF):

public $amount =
  $dollars {out = rules.dollars + ".00";}
  [and $cents {out = out.substring(0, out.length - 3)
                        + "." + rules.cents; }]
;

...

One obvious disadvantage of this approach is that the application has to extract the dollars and the cents from the returned string. Of course, a simple string to number conversion can be done. But due to possible rounding errors, it is best to extract both values separately and converting the two substrings to integers. This may not be that bad, machines are so fast these days.

A less obvious reason why this is not recommended relates to the fact that the computations made by the semantic tags can only begin once the engine has finished recognizing the utterance. In other words, the corresponding computation time directly adds to the application’s response time. The ECMAScript interpreter typically compiles the script (the semantic tag) to an intermediate representation before executing it. Unless the ASR properly caches the result of this compilation process, the script is compiled again and again. The more complicated the script is, the more processing power it takes to parse it, compile it, and execute it.

We also have to add to that the fact that string concatenation/substring extraction creates a lot of unnecessary temporary objects, thus putting a bigger burden on the garbage collector (or any other memory management algorithm employed by the ECMAScript interpreter).

Finally, since semantic tags are compiled and executed for every hypothesis in the N-best list, the computation time and the number of objects created grows proportionately with the number of hypotheses requested by the application. If we sum all this, we end up with a grammar that requires unnecessary processing power from the ASR engine, which can cause significant delays in the recognition process. This may even result in noticeable latency at the application level (i.e. some dead-air).

Use semantic keys instead

A better way to write the above grammar would be:

public $amount =
  $number {out.dollars = rules.number;
           out.cents   = 0; }
  [and $cents {out.cents = rules.cents; }]
;

...

Using explicit semantic keys has many advantages:

  • Documentation. This self-documents the type/purpose of the returned values.
  • Maintenance/evolution. The scripts are much simpler, thus easier to understand for someone trying to understand the grammar. It is also easier to add other keys later if need be.
  • Analytics. The presence of distinct semantic keys facilitates the analysis of field data. For example, we can be interested in performing a recognition performance test for only a subset of our collected utterances, i.e. those utterances whose value for the cents semantic key is 0.
Related posts:
June 1st, 2010 No Comments

by Dominique Boucher

Converting GSL tags to SISR - conflicting goals

NuGram IDE provides a tool to convert Nuance GSL grammars to SRGS ABNF, which can then be converted to XML form. But the tool does not convert the semantic tags. So lately we’ve started working on the conversion of GSL semantic tags to SISR, and what initially seemed like a simple project provoked a heated debate internally (well, I may be exaggerating a bit… ;-). I soon realized that this was because there are really two competing forces driving the design of such a tool:

  • Correctness.The set of SISR tags generated automatically faithfully implement the behaviour of the corresponding Nuance GSL tags. In other words, the resulting grammar needs no manual intervention and the semantic results obtained using the generated grammar are always the same as if the original GSL grammar was used.
  • Maintenance.The set of SISR tags generated automatically are easy to understand, and thus to modify. They are close to what an SISR developer would have written from the start.

To see why these two goals conflict, simply consider how calls to predefined GSL functions can be translated. The GSL tags language provides predefined functions for things as simple as arithmetic operations: $add, $sub, etc. Converting a GSL tag of the form:

{return (add($n $m))

could generate a SISR tag like

{out = $add(n,m);}

if we want to preserve correctness. Here $add would be a function defined in a generated grammar header tag that implements an ECMAScript equivalent of the GSL add function, with proper handling of undefined values:

{!{
  function $add(x, y) {
    if (x == undefined || typeof x != "number") x = 0;
    if (y == undefined || typeof y != "number") y = 0;
    return x + y;
  }
}!};

But if the converter inlines the call and adds some code to check for undefined values, it could produce something like:

{out = ((n == undefined || typeof n != "number") ? 0 : n)
        + ((m == undefined || typeof m != "number") ? 0 : m);}

when one would have simply written:

{out = n + m;}

Unfortunately, this last version may produce unexpected results at runtime. If n is undefined and m is 3, the sum will be NaN (not a number) instead of 3 as would have produced the original GSL tag. In that case, it is essential to complement the tool with a rigorous testing process, one that can ideally ensure that each and every semantic tag in the grammar will be executed at least once.

So which one is better?

The answer is: that depends. In some cases correctness is preferable, especially if the grammar requires little to no maintenance at all. That may be true of simple grammars. But most of the time, grammars change over time. New sentence patterns are added, rules are extracted, etc. So maybe it’s best to leave the choice to the developer by offering a flexible tool.

And you, what would you prefer: a conversion tool that ensures full correctness at all costs, or a tool that sometimes produces grammars that are potentially not equivalent to the original one but are more maintainable?