Category Archives: NuGram

Grammar conversion : lessons learned

Lately, I have been involved in a number of grammar conversion projects. This has been a great opportunity to put our process and  tools to the test once again. And since every project has its peculiarities, we learn constantly.

The process we outlined about a year ago omitted  a number of small details. That was OK for small scale conversion projects. But when you have to deal with much larger projects (with thousands of grammars to convert), these details add up significantly. Let me share some of the issues we face daily.

It’s not just semantic tags

When you have tools to automatically convert semantics tags from one format to another, grammar conversion can seem to be a no-brainer. But reality is not that simple. Grammars are not written for an abstract specification, they are written for a very specific recognition engine. They often contain:

  • Words (tokens) that map to very specific pronunciations or that try to model some disfluencies (like hesitations, for instance), but for which the SRGS $GARBAGE rule is more appropriate.
  • Multiword duplicates, with one sequence of space-separated words, and a similar sequence of underscore-separated words to allow cross-word phonetization (like “thirty one” and “thirty_one”).
  • Words that map to very specific, tuned pronunciations. Such words often have an unusual orthography to make sure they are not confused with real words.

All this means that there are a number of transformations either to the original grammar or to the converted grammars that must be applied. This can be by means of regular expression search&replace, or manually inspecting grammars.

Generation of coverage sets

When dealing with hundreds (if not thousands) of grammars, it is not feasible to create initial coverage test sets manually. This is way too time consuming. That means you have to find a way to generate those initial coverage test sets automatically in batch. But how do you do that?

Fortunately, NuGram IDE already provides sophisticated tools to analyze grammars and generate sentences from them. We just built on this foundation a tool to automatically generate coverage tests sets for a set of ABNF grammars. The tool also reports problems found in the grammars, like the use of digits in voice grammars, or words in DTMF grammars.

The coverage set generation tool uses a combination of  configuration and sophisticated analyses to determine how to generate sentences and how many sentences to generate. For example, it’s not possible to generate all sentences from a grammar that covers an infinite number of sentences. When that’s the case (or when the number of sentences covered by the grammar is above a certain threshold), the tool reverts to other generation strategies.

Recognition tests as part of the QA process

Finally, even a syntactically valid grammar may fail to load in the ASR for a variety of reasons, the most common one being a limitation or constraint from the ASR  itself. For this reason, we got to the conclusion that doing recognition tests (ideally benchmarking of the converted grammars) is a very useful addition to the QA process. Of course, simply compiling grammars may catch a number of problems. But doing a “before and after” comparison can detect conversion problems that were not caught by the coverage tests when they are not exhaustive.

Another benefit of doing recognition tests is the ability to check the performance of the converted grammars to identify those needing additional work. Some converted grammars may have words that prove difficult to recognize with the new engine because they are not properly phonetized, thus calling for application-specific (or even grammar-specific) phonetic dictionaries.

What about DTMF?

In the specific case of converting GSL grammars to GrXML or ABNF,  a complication arises with the presence, in the same grammar, of both DTMF sequences and words. I will discuss this issue in a separate post.

NuGram IDE new licensing scheme

Last week, we released a new version of NuGram IDE. In addition to supporting UTF-16 and UTF-8 with byte-order mark (BOM), the free Basic Edition also comes with a new licensing scheme.

Was does this mean for you? Well, simply that you will have to request a new license file every 90 days. The installation process is fairly simple:

  1. You install NuGram IDE as before, by adding http://nugram.nuecho.com/update-site to your Eclipse update sites.
  2. You request a new license file from our web site. You will then receive an email containing a link to the license file.
  3. You follow the link and save the downloaded document to a file in the $HOME/nuecho directory (this can be changed in the Eclipse preferences).

It’s as simple as that. And we have an automatic process that will remind you by email to renew your license (steps 2 and 3) just a few days before its expiration.

The rationale

You may wonder why we decided to change the licensing scheme. Downloading and installing the Basic Edition was much more straightforward before. And as a user of free software myself, I don’t like complicated registration processes. I’m usually turned off when I need to enter lots of personal information, I often simply go away.

Then why? NuGram IDE is a relatively specialized tool, but we get downloads at a surprising rate. However, we don’t really know how many people or organizations use it on a regular basic. Do they simply take a look at it and uninstall it? Do they use it only once a year, for a very punctual task? The problem lies in the way the software is obtained (by means of an Eclipse update site). By asking our users to request a new license at a regular interval, we hope to better know our user base. We do think that the free edition of NuGram IDE is a software with real value and it’s not too much asking in return.

Of course, if you don’t like the idea of updating your license every 90 days, you can still buy the Professional Edition and not be annoyed anymore…

NuGram IDE 2.3.0 is out!

The Nu Echo team is pleased to announce the availability of NuGram IDE 2.3.0. This new release features full support for UTF-16 and UTF-8 with byte-order mark (BOM) and fixes a number of problems with non-European languages.

The free Basic Edition is available at the usual location.

What’s the point of having 98% in-grammar accuracy if 40% of user utterances are out-of-grammar?

How many times have you heard people say that they “achieve 95% speech recognition accuracy” (or more)? That sounds really impressive, doesn’t it?

It shouldn’t. What they don’t tell you is that they actually measure “in-grammar accuracy”, which means that accuracy is measured only on utterances that are perfectly covered by the grammar. For instance, for a date grammar, an utterance such as “well, uh, january fourth” would be considered out-of-grammar (and therefore ignored from the accuracy calculation) if “well, uh” is not covered by the grammar.

Unfortunately, in the real world there’s no way to force users to stick to in-grammar utterances. In fact, users usually have no way of even knowing what the grammar covers other than through hints provided by the prompts. Even well-behaved users can hesitate, correct themselves, or use an unexpected formulation (which sounds perfectly natural to them), all of which are likely to be out-of-grammar. They can even say things that they believe will help the machine understand them (for instance using “victor” instead of “v” when spelling).

As a result, it’s not unusual to have between 30% and 50% of user utterances that are considered out-of-grammar, many of which are perfectly legitimate responses to the application prompt. So what’s the point of reporting in-grammar accuracy if this ignores a large chunk of legitimate user utterances? You tell me.

Just to illustrate, you want to know one of the most effective ways of improving in-grammar accuracy? Just reduce grammar coverage. Sure, your out-of-grammar rate will increase but, hey, you’ll improve in-grammar accuracy! Isn’t that great? This tells you how useless in-grammar accuracy is at telling you whether you improved the grammar.

This is why we always report accuracy by considering every legitimate user utterance (i.e., the ones that contains a valid response to the prompt, regardless of wording or extraneous speech). This way, we make sure that we don’t conveniently ignore the utterances that happen to be the more challenging and we get results that accurately represent the real recognition performance (not some imaginary performance calculated on an idealized set of clean utterances).

But the best reason for doing it our way is that it enables us to truly measure improvements when we tune grammars. The reason is simple. Changing the coverage of a grammar always involves a trade-off. We can improve accuracy by covering more user utterances, but this can reduce overall accuracy if the new grammar paths introduce new speech recognition errors. The only way we can measure improvement is if we measure accuracy on a fixed set of valid utterances that doesn’t depend on the actual grammar coverage.

Testing dynamic grammars

In my post on NuGram and CouchDB, I neglected to mention how the dynamic grammar was authored and, most importantly, tested. Having a repeatable process for testing grammars is very important when developing a speech application, as most grammars change and get more complex over time.

Of course, the grammar was authored with NuGram IDE. NuGram IDE has some great features to test grammars, and especially dynamic grammars. Dynamic grammars (like the streets grammar) have always been more difficult to debug than static grammars. They can be very easy to write for small applications or prototypes (or blog posts…), but in real applications their coverage tests are often (and should!) run in batch as part of an automated build process. But this is often too cumbersome in practice. For instance, a dynamic grammar implemented as a JSP page requires a web application server to run and if the JSP page makes queries to a database, the DB must be running somewhere too. This greatly complicates the setup to make batch coverage tests. Moreover, writing and testing the dynamic grammar requires some programming skills that speech scientists don’t always have (at least not in large organizations).

With NuGram’s template language, a dynamic grammar can be tested in NuGram IDE Basic Edition in two different ways:

  • Using predefined data encoded as a JSON object (a JSON context), or
  • Using some custom Java code (a Java context).

Both ways require the creation of an instantiation context. It’s simply a mapping between variable names and values. An instantiation context must provide a value for each and every variable used in the grammar template. The values are used to populate the template and produce the resulting (ABNF or XML) grammar. The way the instantiation context is created depends on the type of context. For a JSON context, the instantiation context is the JSON document itself. For the Java context, some Java code populates a map from strings to objects.

The following video shows how to create a JSON context for the street grammar:

This one shows the steps required to create and use a Java context:

Note: there was a subtle (uncovered) bug in the previous version of NuGram IDE. If you want to create Java contexts like in the video above, please make sure to download the latest version.

The whole project used in the videos is available on github. The Java context initializers use the following open-source libraries:

In the next post, I will show how to use the Java context initializer to deploy the streets grammar on the Java-based version of NuGram Server.

And you, how do you test your dynamic grammars?