Tag Archives: nugram

A proven yet simple grammar conversion process

Grammar Conversion Process

As old speech recognition engines are being replaced by newer ones, we see more and more organizations having to convert their old grammars to standard formats. Given the right process and set of tools, converting grammars from one engine to another should be a straightforward task with mostly no risk of breaking the associated IVR application.

The issues

There are several issues associated with the conversion of grammars:

  • Syntax. First, there is the syntax of the grammar itself. If we are converting a grammar bewteen two engines that support GrXML or ABNF, then there’s not much else to say. But if we are converting from Nuance GSL to GrXML or ABNF, that’s a different story. GSL has very different operator precedences than ABNF, for instance. We have to be careful.
  • Semantics. The second issue is the language used inside the semantic tags. Again, if both engines support SISR, we have nothing to do. But if we convert from GSL to ABNF+SISR, we may have a harder time. For example, SISR does not support the concept of a top-level slot that can be assigned from anywhere in the grammar (using the <slot value> syntax).
  • Pronunciation lexicons. Almost all speech engines use a different format for lexicons. Not to mention that even different versions of the same engine sometimes support different phonetic alphabets.

A proven process

If you follow a rigorous process, the first two issues above can be easily mitigated. Here is one that has proven very effective:

  1. A coverage test set is produced from the original grammar. The test set should ideally ensure that all semantic tags are executed at least once (this is not always sufficient if the semantic tags contain conditional code, but that’s a good starting point.)
  2. The grammar is converted to the new format.
  3. The converted grammar is tested against the coverage test set of the original grammar (and problems are fixed, if any, until all tests pass).

Some tools

Some ASR engines already provide tools to convert grammars from old proprietary formats to the new standard ones. For instance, Nuance ships a tool to automatically convert GSL grammars to GrXML + SISR. It does not support all features of GSL as some of them have no equivalence in GrXML and SISR. And one of the problems with this converter is that the semantic tags produced are not easily maintainable.

NuGram IDE also provides some tools to help with the above process. In particular, it offers:

  • Great support for creating and running coverage tests.
  • A sophisticated sentence generation tool. The tags coverage strategy, for instance, is very effective when converting grammars as it helps generating sentences that will cover all semantic tags.
  • Support for all major semantic tags formats (GSL, Nuance OSR extensions, IBM and Microsoft, etc.).

Of course, to use NuGram effectively, your grammars will need to be converted to ABNF first. No problem! NuGram provides GSL and GrXML to ABNF converters to help you, as well as converters from ABNF to GSL or GrXML. That means all you have to worry about is really the conversion of the semantic tags. In this case, the whole process now becomes:

  1. Grammars are first imported in ABNF.
  2. A coverage test set is produced from the original grammar.
  3. Semantic tags are converted.
  4. The converted grammars are checked for errors by running the coverage tests of the original grammars. In case of errors, they are fixed and all tests are re-run.
  5. Convert the grammars to the desired target format.

What about pronunciation lexicons?

Unfortunately, converting phonetic dictionaries is still a manual and error-prone process, for which there are no good solutions as of this writing. And this task is more part of the tuning process that follows the grammar conversion process anyway. In most cases, a grammar’s pronunciation lexicon is used to fix incorrect or missing pronunciations in the ASR engine’s own dictionary for very specific words. The phonetic dictionary of the target ASR engine may not have the same limitations or deficiencies. At best, the original grammar’s pronunciation lexicon can act as an inspiration for the creation of the new pronunciation lexicon.

Get two NuGram IDE Pro licenses free when you purchase a grammar development course

Learn how to systematically deliver high-quality, high performance grammars by fully leveraging the features and tools available in NuGram IDE. Supported by hands-on exercises and numerous examples, Effective Grammar Development with NuGram IDE provides a breadth of knowledge, best practices, and tips and tricks that have shown their effectiveness at addressing the main challenges of grammar development and at delivering better grammars faster.

And if you order our on-site grammar development course before October 31st, you will get two licenses of NuGram IDE Professional Edition entirely free! There is only one catch: course must be given before December 31st, 2010. Contact us for details.

An ABNF primer

Interestingly, a lot of hits on our NuGram web site come from people looking for the words ABNF tutorial on one of the major search engines. And although we provide great tools for working with ABNF grammars, we don’t provide any introductory text on the ABNF syntax. That’s a shame!

To remedy this situation, I just put on Slideshare a presentation extracted from our training material that covers the basic concepts of ABNF grammars.

Remember that ABNF is the native syntax for many speech recognition (ASR) engines. And if your ASR doesn’t support it, let NuGram IDE handle the conversion to XML for you!

NuGram IDE 2.2 available now!

Together with the introduction of NuGram Server Free Developer Edition, the Nu Echo team is pleased to announce that it also releases NuGram IDE 2.2. With this new release, designing and testing dynamic grammars has never been easier.

The most important feature introduced in this release is the support for Java to populate dynamic grammars. When using NuGram IDE, you use the exact same code to test and tune your grammar that will run in production, but without the long deployment cycle associated with stopping, deploying and restarting a Java web application. And deploying your grammars in NuGram Server is as simple as deploying JSP pages.

The free Basic Edition can be downloaded directly from within your Eclipse environment. Simply follow the download instructions. Or contact us for the Professional Edition.

Comparing different speech recognition engines

We’re sometimes asked to compare the performance of different speech recognition engines on an identical task (same grammar, same set of test utterances). To do so in an effective way, we rely on three important features of NuGram Server (on-the-fly conversion of grammars to any format, semantic interpretation of textual sentences, and a NuGram-specific meta value that removes all semantic tags from generated grammars), which we use extensively in our tuning environment.

One powerful aspect about our tuning environment is that, no matter what recognition engine we use, there is no difference in the way we perform speech recognition experiments and then score and analyze results. It’s all completely transparent. This makes it easy to run the exact same experiment using different recognition engines and then compare results using metrics, graphs, and other tools that are used consistently across all engines.

A big challenge when comparing different engines is that we usually can’t use the same grammar since different engines often use incompatible grammar tag formats. For instance, let’s say we have a recognition grammar for the Loquendo LASR speech recognition engine and we would like to compare the performance we get with this grammar using three different engines: Loquendo LASR, OSR 3.0, and Nuance 8.5. In that case, we have three different tag formats: Loquendo uses SISR, OSR 3.0 uses swi-semantics and Nuance 8.5 uses the Nuance GSL proprietary tag format. So in principle, we would need to convert the grammars for each recognition engine, which can be a significant effort for complex grammars.

No need for manual grammar conversion

It is, however, possible to compare different recognition engines without having to manually convert the grammars. The approach we use is quite simple: With each engine that is not compatible with the original grammar’s tag format, we perform speech recognition using a grammar from which semantic tags have been removed and we then add semantic information back to the recognition result as a post-processing step.

This is all done using NuGram Server, as follows.

We start with the original grammar in ABNF format (here credit-card.abnf), which we use for the recognition test using Loquendo ASR. Then, we add a special-purpose NuGram meta directive to the grammar, which tells NuGram Server to omit the semantic tags when generating the grammar:

#ABNF 1.0 ISO-8859-1;
language en-US;
mode voice;
tag-format <semantics/1.0>

meta "com.nuecho.generation.omit-tags" is "true";

root $main;

When we perform the recognition test with OSR, we tell NuGram Server that we want to use credit-card.grxml (note the extension). NuGram Server then automatically converts credit-card.abnf to the SRGS XML format, while omitting the semantic tags from the grammar. Recognition then proceeds without a hitch, but results are returned without any semantic slots.

Similarly, when we perform the recognition test with Nuance 8.5, we tell NuGram Server that we want to use credit-card.gsl, which tells NuGram Server to automatically convert credit-card.abnf into a GSL grammar (still without semantic tags). Recognition once again proceeds without a hitch and results are returned without semantic slots.

Finally, in order to get recognition results with semantic slots, we simply send the original credit-card.abnf grammar and the recognition results to NuGram Server in order to add semantic slots to the recognition results. In other words, semantic interpretation is done as a post-processing step by NuGram Server based on the SISR tags in the original grammar.

Note that if the original grammar had been a GSL grammar or an OSR grammar, NuGram Server could still have computed the semantic interpretation based on the semantic tags in the original grammar (NuGram understands many different tag formats).

Dealing with engine-proprietary features

Some engine-proprietary features might make results more difficult to compare. For instance, OSR and Nuance 9 provide the special-purpose SWI_disallow key, which can be used to remove hypotheses from the N-best list of recognition hypotheses returned by the engine. This could for instance be used to remove credit card numbers that don’t have a valid checksum, therefore improving recognition accuracy as a result.

This useful feature could make recognition results difficult to compare if some engines have it and others don’t (in which case an equivalent result could be obtained by removing invalid hypotheses in the application). Fortunately, in our recognition tests we have the ability to tell NuGram Server to remove, from the N-best list, those hypotheses that match a specified slot pattern (e.g., SWI_disallow=1). This once again makes it possible to make fair and accurate comparisons between OSR or Nuance 9 and other engines.