Monthly Archives: October 2008

SISR support by leading ASR engine vendors

One of our NuGram IDE users recently asked us how well SISRW3C’s specification for semantic tags, is supported by current speech recognition platforms. For the benefit of all, here is the current status for the major players in the field:

IBM WVS SISR April 2003
Loquendo SISR 1.0 compliant (1)
LumenVox SISR 1.0 compliant (although the tag-format header is not standard)
Microsoft OCS 2007 Speech Server SISR 1.0 compliant (1)
Nuance OSR Proprietary semantic language based on ECMAScript
Nuance 8.5 GSL + proprietary semantic language
Nuance v9 SISR 1.0 compliant, with proprietary extensions (SWI objects)
Telisma SISR 1.0 compliant (1)
Voxeo ASR SISR 1.0 compliant

(1) Based on information from the company website, we have not tested it yet.

As we can see, SISR is now prevalent in the latest offerings from the major ASR vendors. This, of course, doesn’t mean that the engine you have to use will support SISR. It’s going to be a while before the current installed base upgrades to SISR-compliant engines.

However, if the engine you need to use happens to give you a choice (e.g., for backward compatibility reasons, Nuance v9 supports both SISR and SWI_semantics), it makes sense to seriously consider using SISR. Your grammars will be much more portable across engines (to a certain extent, of course) and the time taken to master it will be a good investment in the long term.

We should point out that NuGram IDE supports all leading semantic tag formats. What this means is that, for any supported tag format, the tool can compute the semantic interpretation in the exact same way the ASR engine does. So, whether or not you use SISR makes no difference: You can still use NuGram IDE to develop, debug, and test your grammars.

NuGram IDE Beta-20081010 just released

The Nu Echo team is pleased to announce the release of a new beta version of NuGram IDE.

Highlights

In addition to a number of small bug fixes, the following features have been added, in response to specific requests from users:

  • Unified editor. The ABNF and coverage editors have been merged in a single, multi-tab editor.
  • Improved refactoring tools.  The refactoring tools have been enhanced to better support semantic tags. For example, semantic slots can now be renamed. Also, the rule extraction refactoring properly adjusts semantic tags.
  • GSL Importer. Nuance GSL grammars can now be translated to ABNF.
  • Better encoding detection. The environment now uses the proper Eclipse mechanism to detect the encoding of an ABNF file.
  • Comments preservation for imported grammars. When converting grammars from XML form or GSL to ABNF, comments are preserved.
  • Project/folder publishing. Whole grammar hierarchies can be uploaded to a NuGram Server at once. See the online documentation for more details.

NuGram Hosted Server improvements

We also recently upgraded the NuGram Hosted Server with support for the following features:

  • Complete HTTPS support. The login and registration process is now done via secured pages to help protect privacy. Also, the NuGram Server HTTP API now fully supports HTTPS.
  • Grammar Content viewing. Grammars published on the server can be previewed from the grammar browsing page. Just click on a grammar name and see the grammar source code!
  • Account settings. A new Account page let you manage your account settings.

Download now!

We strongly encourage people to download this new version as soon as possible as it contains many important new features and bug fixes (and the previous version will expire on November 1st, anyway, while the new release will expire on April 1st, 2009). And as usual, we solicit your feedback to help improve our product and better support your grammar development process.

Use cases for dynamic grammars (part 2)

In the previous post, I talked about the the main motivations for using dynamic grammars and described the most common usage scenarios. Now, let me make all of this somewhat more concrete by providing a bunch of  examples (most of which we’ve used in applications we’ve built over the years).

Let’s start with a few examples of grammars will likely need to be re-generated for every single call:

  • Address capture — In order to capture the address of a caller, an application might first ask for the caller’s postal or zip code and then ask for the address using an address recognition grammar dynamically built based on a list of address records associated to the recognized postal or zip code.
  • Voice dialing — A voice dialing application could use a recognition grammar dynamically generated from the data in a user’s address book. The grammar could support sentences such as “Call John Smith”, “John Smith at home”, “Call John Smith’s cellular”, etc.
  • Personalized bill payee list — In a banking bill payment application, the payee list grammar is dynamically generated based on the list of payees that has been set up by the user.
  • Personalized menu options — There is a growing trend towards applications that are increasingly personalized for each user. In that vein, an application’s main menus could be personalized for each user based either on the user’s past usage patterns or on personalization actually done by the user on the company’s web site.
  • Identity validation — Many applications use security questions to validate the identity of the caller. Based on an identity claim (e.g., a social security number or a telephone number), the application asks the caller to answer security questions based on information contained in the caller’s profile, for instance a mother’s maiden name, the name of a pet, a secret word, etc. In this case, because the range of possible responses would often be too large, some of the recognition grammars need to be dynamically built based on the expected responses.
  • One-step correction — Let’s suppose an address recognition N-best list contains the following hypotheses: “four fifty main street”, “four sixty main street”, and “four fifty-one main street” and let’s suppose the caller has actually spoken the third hypothesis. Suppose also that, when confirming the first hypothesis to the caller, we use a confirmation grammar that covers corrections that the caller is likely to make when being proposed an incorrect choice (e.g., “no, four sixty-one”). In other words, the confirmation grammar is built based on hypotheses found in the recognition result. This would make it possible to recognize the eventual correction and act on it, thereby avoiding unnecessary interactions with the caller and, as a result, contributing to enhanced user experience and success rate.
  • Choose from a user-specific list of reservations/orders/transactions/accounts — For instance, let’s say a client calls in order to cancel a flight reservation. The application retrieves all reservations corresponding to the client and asks the caller to say the departure date or the destination in order to identify the correct reservation. The recognition grammar would, of course, be dynamically built based on information obtained from the retrieved reservations. Another example is someone who calls regarding his electricity bill. If the caller has more than one account (e.g., a condo in the city and a second home by a lake), the application could identify the correct account by asking for the address associated with the bill. In this case, the grammar would be built from the addresses associated with all the caller’s accounts.
  • List navigation — Let’s say a flight reservation application has retrieved a number of flights corresponding to the caller’s criteria and then lists all such flights, followed by the question: “Which flight would you like?”, to which the caller could respond “The 10:35 flight”. The recognition grammar could, once again, be dynamically built based on information contained in the proposed list of flights.

Note that in some of these cases (e.g., voice dialing, personalized bill payee list, or personalized menu options) the new grammars could also have been generated – and possibly compiled – offline, either as soon as the relevant information was changed by the user or as part of a scheduled maintenance process. This would help reduce latency during calls.

Here are examples of dynamic grammars based on data that change slowly over time:

  • Dates — Most date grammars would gain from being dependent on the current date. For instance, in a travel reservation application, a departure date only occurs in the future and the return date should be greater than the departure date. Similarly, a birth date normally occurs in the past. Making date grammars a function of the current date eliminates maintenance problems while maximizing accuracy.
  • Telephone numbers — Telephone number recognition accuracy is significantly higher when the area codes allowed by the grammar are limited to those that actually exist. Unfortunately, the list of area codes continuously evolves. In order to maintain the recognition accuracy as high as possible while making sure that all required phone numbers are supported, the telephone number grammar could be dynamically generated based on a continuously updated list of area codes.
  • Postal or zip codes — Many applications ask for the caller’s postal or zip code. For instance, a citizen calling City Hall in order to inquire about the garbage collection schedule might be asked for his/her postal code in order to appropriately locate the house or apartment. If the recognition grammar is designed to only support valid postal codes, it should be updated periodically in order to account for changes in the list of postal codes.

Finally, here are examples of dynamic grammars that could be used as part of a regular application maintenance process:

  • Bill payee list management — Banks continuously update the list of companies, utilities, municipalities, school boards, etc., available for bill payment through their telebanking application. If the bank wants to let their customers add new payees to their own personal bill payee list using the IVR application, the application needs to use a grammar containing all supported payees.
  • Stock quotes — The companies listed on any stock exchange change continuously as new companies are added and existing companies become delisted. As a result, most stock quote applications come with a regular grammar maintenance service to make sure that the recognition grammars are as current as possible.
  • Mutual funds — Same as stock quotes.
  • Branch location — Possible dynamic grammars used for branch location purposes include: City-specific street intersection grammars and city-specific address grammars.

It’s of course easy to come up with many more examples that are similar to those listed above. If you have used dynamic grammars that you think are interesting or markedly different from those listed above, we’d certainly like to hear about them. And, naturally, if you have used dynamic grammars in the past, we’d really like you to try re-developing some of them with NuGram IDE and tell us what you think.

Use cases for dynamic grammars

We often use dynamic grammars in our applications. In fact, most of our applications use some form of dynamic grammar. This is why we long ago came to the conclusion that a complete grammar development and deployment solution had to be able to support both static and dynamic grammars.

From many interesting discussions we’ve had lately (in particular since the introduction of the NuGram Platform at SpeechTEK 2008), however, we have come to realize that people who develop grammars (VUI designers, speech scientists, application developers) do not always fully leverage dynamic grammars. For this reason, we thought it could be interesting to share our thoughts on use cases for dynamic grammars.

In this article, we will focus on motivations and usage scenarios. The next article will focus on describing a number of specific examples of dynamic grammars commonly – and perhaps not so commonly – used in speech applications. So let’s start with motivations. The main ones we see are the following:

  • The grammar content is only known at run-time — This, of course, is the obvious case. Many situations require grammars to be generated on-the-fly based on information obtained during the call, either from an outside source (e.g., through a web service or a database query) or directly from the user (e.g., from the recognition results of a previous interaction).
  • To improve recognition accuracy — Dynamic grammars can significantly improve recognition accuracy by making it possible to constrain the recognition grammar based on information available at run-time. It’s important to emphasize that this is often a much better solution than applying the same constraints while post-processing the recognition result (e.g., using a combination of SWI_vars and SWI_disallow with OSR or Nuance 9). Indeed, constraining the grammar prior to recognizing the utterance will almost always provide faster recognition and, more importantly, more accurate results than removing “disallowed” hypotheses as a post-processing step. It’s easy to understand why. Not sufficiently constraining the recognition grammar not only results in unnecessarily searching, during recognition, hypotheses that will get thrown away during post-processing (therefore wasting computational resources), but the presence of unnecessary alternatives in the grammar will often cause the correct hypothesis to be pruned away from the N-best list, therefore reducing accuracy.
  • To avoid using proprietary engine features — For instance, although SWI_vars and SWI_disallow may sometimes offer an acceptable alternative to using dynamic grammars, one should not forget that this implies restricting the application to only work on specific recognition engines. The use of dynamic grammars provide a much more portable solution, while being more accurate.
  • To solve maintenance problems — Let’s say that, for accuracy reasons, we want that the date grammar used by an application be constrained to the current year or the next. This, for instance, would be the case for a travel application asking about a departure date. If the grammar is static, this will require someone to modify the grammar once a year, a dangerous proposition given that if for any reason this update doesn’t get done, this may cause the application to completely stop working at some point. A better solution is to use a dynamic grammar that always makes sure that the grammar used is based on the current year. This completely solves the maintenance problem while making sure that the grammar used always provides optimal performance.

We should point out, however, that it’s sometimes much easier to implement complex constraints with a combination of ECMAScript, SWI_vars, and SWI_disallow (when possible) than to dynamically generate a grammar that has the same constraints built-in. For instance, to dynamically generate a grammar that only supports numbers between arbitrary lower and upper bounds is not a trivial matter, while doing it with ECMAScript is rather trivial. In some cases, the best solution is a combination of both techniques.

Now let’s discuss usage scenarios. There are actually many ways in which dynamic grammars can be used. For instance:

  • On-the-fly — Dynamic grammars can be used on-the-fly to generate grammars that are based on data specific to a given call. This is the most dynamic situation, in which almost every call ends up using different instances of the same dynamic grammar. In this case, a new grammar instance must be generated, loaded, and compiled for every single call, which may introduce latency if grammars are large.
  • Offline (triggered) — The generation of a new grammar can be triggered by an event occurring outside of the IVR application. For instance, the generation of a new speech attendant grammar could be triggered by a change in the company’s corporate directory.
  • Offline (scheduled) — Dynamic grammars can be used offline, as part of a regularly scheduled grammar maintenance process. For instance, dynamic grammars could be used in order to provide a biweekly stock quote grammar maintenance service in which new (static) grammars are delivered every other week based on an updated list of companies.
  • Offline (build time) — Dynamic grammars can also be used as an integral part of the application build process, where some of the grammars are generated based on company-specific data. For instance, a grammar used to recognize branch names and addresses would need to be produced based on branch data provided by the company. In this case it’s probably necessary to also have a scheduled maintenance process in order to make sure that the application remains up-to-date with changes at the company.

Note that in order to avoid an undesirable delay caused by the compilation of the grammar during a call, grammars generated offline could also be pre-compiled before they are used by the application. This is particularly important, if not mandatory, for very large grammars, some of which might take several seconds – if not minutes – to compile. Note also that any change in a grammar used by an application often implies that other portions of the application be updated as well. For instance, an updated grammar may imply the need for new confirmation prompts.

Next post: A bunch of dynamic grammar examples. If you have any examples to suggest, let us know.