October 9th, 2008 No Comments

by Yves Normandin

Use cases for dynamic grammars (part 2)

In the previous post, I talked about the the main motivations for using dynamic grammars and described the most common usage scenarios. Now, let me make all of this somewhat more concrete by providing a bunch of  examples (most of which we’ve used in applications we’ve built over the years).

Let’s start with a few examples of grammars will likely need to be re-generated for every single call:

  • Address capture — In order to capture the address of a caller, an application might first ask for the caller’s postal or zip code and then ask for the address using an address recognition grammar dynamically built based on a list of address records associated to the recognized postal or zip code.
  • Voice dialing — A voice dialing application could use a recognition grammar dynamically generated from the data in a user’s address book. The grammar could support sentences such as “Call John Smith”, “John Smith at home”, “Call John Smith’s cellular”, etc.
  • Personalized bill payee list — In a banking bill payment application, the payee list grammar is dynamically generated based on the list of payees that has been set up by the user.
  • Personalized menu options — There is a growing trend towards applications that are increasingly personalized for each user. In that vein, an application’s main menus could be personalized for each user based either on the user’s past usage patterns or on personalization actually done by the user on the company’s web site.
  • Identity validation — Many applications use security questions to validate the identity of the caller. Based on an identity claim (e.g., a social security number or a telephone number), the application asks the caller to answer security questions based on information contained in the caller’s profile, for instance a mother’s maiden name, the name of a pet, a secret word, etc. In this case, because the range of possible responses would often be too large, some of the recognition grammars need to be dynamically built based on the expected responses.
  • One-step correction — Let’s suppose an address recognition N-best list contains the following hypotheses: “four fifty main street”, “four sixty main street”, and “four fifty-one main street” and let’s suppose the caller has actually spoken the third hypothesis. Suppose also that, when confirming the first hypothesis to the caller, we use a confirmation grammar that covers corrections that the caller is likely to make when being proposed an incorrect choice (e.g., “no, four sixty-one”). In other words, the confirmation grammar is built based on hypotheses found in the recognition result. This would make it possible to recognize the eventual correction and act on it, thereby avoiding unnecessary interactions with the caller and, as a result, contributing to enhanced user experience and success rate.
  • Choose from a user-specific list of reservations/orders/transactions/accounts — For instance, let’s say a client calls in order to cancel a flight reservation. The application retrieves all reservations corresponding to the client and asks the caller to say the departure date or the destination in order to identify the correct reservation. The recognition grammar would, of course, be dynamically built based on information obtained from the retrieved reservations. Another example is someone who calls regarding his electricity bill. If the caller has more than one account (e.g., a condo in the city and a second home by a lake), the application could identify the correct account by asking for the address associated with the bill. In this case, the grammar would be built from the addresses associated with all the caller’s accounts.
  • List navigation — Let’s say a flight reservation application has retrieved a number of flights corresponding to the caller’s criteria and then lists all such flights, followed by the question: “Which flight would you like?”, to which the caller could respond “The 10:35 flight”. The recognition grammar could, once again, be dynamically built based on information contained in the proposed list of flights.

Note that in some of these cases (e.g., voice dialing, personalized bill payee list, or personalized menu options) the new grammars could also have been generated – and possibly compiled – offline, either as soon as the relevant information was changed by the user or as part of a scheduled maintenance process. This would help reduce latency during calls.

Here are examples of dynamic grammars based on data that change slowly over time:

  • Dates — Most date grammars would gain from being dependent on the current date. For instance, in a travel reservation application, a departure date only occurs in the future and the return date should be greater than the departure date. Similarly, a birth date normally occurs in the past. Making date grammars a function of the current date eliminates maintenance problems while maximizing accuracy.
  • Telephone numbers — Telephone number recognition accuracy is significantly higher when the area codes allowed by the grammar are limited to those that actually exist. Unfortunately, the list of area codes continuously evolves. In order to maintain the recognition accuracy as high as possible while making sure that all required phone numbers are supported, the telephone number grammar could be dynamically generated based on a continuously updated list of area codes.
  • Postal or zip codes — Many applications ask for the caller’s postal or zip code. For instance, a citizen calling City Hall in order to inquire about the garbage collection schedule might be asked for his/her postal code in order to appropriately locate the house or apartment. If the recognition grammar is designed to only support valid postal codes, it should be updated periodically in order to account for changes in the list of postal codes.

Finally, here are examples of dynamic grammars that could be used as part of a regular application maintenance process:

  • Bill payee list management — Banks continuously update the list of companies, utilities, municipalities, school boards, etc., available for bill payment through their telebanking application. If the bank wants to let their customers add new payees to their own personal bill payee list using the IVR application, the application needs to use a grammar containing all supported payees.
  • Stock quotes — The companies listed on any stock exchange change continuously as new companies are added and existing companies become delisted. As a result, most stock quote applications come with a regular grammar maintenance service to make sure that the recognition grammars are as current as possible.
  • Mutual funds — Same as stock quotes.
  • Branch location — Possible dynamic grammars used for branch location purposes include: City-specific street intersection grammars and city-specific address grammars.

It’s of course easy to come up with many more examples that are similar to those listed above. If you have used dynamic grammars that you think are interesting or markedly different from those listed above, we’d certainly like to hear about them. And, naturally, if you have used dynamic grammars in the past, we’d really like you to try re-developing some of them with NuGram IDE and tell us what you think.

October 6th, 2008 No Comments

by Yves Normandin

Use cases for dynamic grammars

We often use dynamic grammars in our applications. In fact, most of our applications use some form of dynamic grammar. This is why we long ago came to the conclusion that a complete grammar development and deployment solution had to be able to support both static and dynamic grammars.

From many interesting discussions we’ve had lately (in particular since the introduction of the NuGram Platform at SpeechTEK 2008), however, we have come to realize that people who develop grammars (VUI designers, speech scientists, application developers) do not always fully leverage dynamic grammars. For this reason, we thought it could be interesting to share our thoughts on use cases for dynamic grammars.

In this article, we will focus on motivations and usage scenarios. The next article will focus on describing a number of specific examples of dynamic grammars commonly - and perhaps not so commonly - used in speech applications. So let’s start with motivations. The main ones we see are the following:

  • The grammar content is only known at run-time — This, of course, is the obvious case. Many situations require grammars to be generated on-the-fly based on information obtained during the call, either from an outside source (e.g., through a web service or a database query) or directly from the user (e.g., from the recognition results of a previous interaction).
  • To improve recognition accuracy — Dynamic grammars can significantly improve recognition accuracy by making it possible to constrain the recognition grammar based on information available at run-time. It’s important to emphasize that this is often a much better solution than applying the same constraints while post-processing the recognition result (e.g., using a combination of SWI_vars and SWI_disallow with OSR or Nuance 9). Indeed, constraining the grammar prior to recognizing the utterance will almost always provide faster recognition and, more importantly, more accurate results than removing “disallowed” hypotheses as a post-processing step. It’s easy to understand why. Not sufficiently constraining the recognition grammar not only results in unnecessarily searching, during recognition, hypotheses that will get thrown away during post-processing (therefore wasting computational resources), but the presence of unnecessary alternatives in the grammar will often cause the correct hypothesis to be pruned away from the N-best list, therefore reducing accuracy.
  • To avoid using proprietary engine features — For instance, although SWI_vars and SWI_disallow may sometimes offer an acceptable alternative to using dynamic grammars, one should not forget that this implies restricting the application to only work on specific recognition engines. The use of dynamic grammars provide a much more portable solution, while being more accurate.
  • To solve maintenance problems — Let’s say that, for accuracy reasons, we want that the date grammar used by an application be constrained to the current year or the next. This, for instance, would be the case for a travel application asking about a departure date. If the grammar is static, this will require someone to modify the grammar once a year, a dangerous proposition given that if for any reason this update doesn’t get done, this may cause the application to completely stop working at some point. A better solution is to use a dynamic grammar that always makes sure that the grammar used is based on the current year. This completely solves the maintenance problem while making sure that the grammar used always provides optimal performance.

We should point out, however, that it’s sometimes much easier to implement complex constraints with a combination of ECMAScript, SWI_vars, and SWI_disallow (when possible) than to dynamically generate a grammar that has the same constraints built-in. For instance, to dynamically generate a grammar that only supports numbers between arbitrary lower and upper bounds is not a trivial matter, while doing it with ECMAScript is rather trivial. In some cases, the best solution is a combination of both techniques.

Now let’s discuss usage scenarios. There are actually many ways in which dynamic grammars can be used. For instance:

  • On-the-fly — Dynamic grammars can be used on-the-fly to generate grammars that are based on data specific to a given call. This is the most dynamic situation, in which almost every call ends up using different instances of the same dynamic grammar. In this case, a new grammar instance must be generated, loaded, and compiled for every single call, which may introduce latency if grammars are large.
  • Offline (triggered) — The generation of a new grammar can be triggered by an event occurring outside of the IVR application. For instance, the generation of a new speech attendant grammar could be triggered by a change in the company’s corporate directory.
  • Offline (scheduled) — Dynamic grammars can be used offline, as part of a regularly scheduled grammar maintenance process. For instance, dynamic grammars could be used in order to provide a biweekly stock quote grammar maintenance service in which new (static) grammars are delivered every other week based on an updated list of companies.
  • Offline (build time) — Dynamic grammars can also be used as an integral part of the application build process, where some of the grammars are generated based on company-specific data. For instance, a grammar used to recognize branch names and addresses would need to be produced based on branch data provided by the company. In this case it’s probably necessary to also have a scheduled maintenance process in order to make sure that the application remains up-to-date with changes at the company.

Note that in order to avoid an undesirable delay caused by the compilation of the grammar during a call, grammars generated offline could also be pre-compiled before they are used by the application. This is particularly important, if not mandatory, for very large grammars, some of which might take several seconds – if not minutes – to compile. Note also that any change in a grammar used by an application often implies that other portions of the application be updated as well. For instance, an updated grammar may imply the need for new confirmation prompts.

Next post: A bunch of dynamic grammar examples. If you have any examples to suggest, let us know.

There are many free hosted VoiceXML platforms out there to try out new ideas, prototype applications, etc. I use one of them on a regular basis. Unfortunately, each time I need dynamically generated grammars in my application, I’m stuck. I have to roll my own solution (typically by launching a Web server on my machine, opening a temporary port in our firewall …). Ouch!

All of this is no longer necessary, thanks to our new NuGram Hosted Server, which we launched two weeks ago at SpeechTEK. In this post, I will show how to add dynamic grammars to a standard, VoiceXML 2.1 compliant application. You won’t need to install or deploy any Web server technology. All you’ll need is:

  • Eclipse 3.2 or higher with NuGram IDE installed;
  • an account on grammarserver.com;
  • an account on Evolution Developer Portal to deploy and test the VoiceXML application. (You can use any VoiceXML 2.1 platform, of course, but the example uses some non-standard objects exposed in ECMAScript by the Evolution VoiceXML interpreter.)

The sample application

I will illustrate the whole process of adding dynamic grammars to a VoiceXML application by developing a very simple-minded voice-activated auto-attendant-like application. The application will simply ask for a name and tell you the associated extension number.

Step 1 - Edit your grammar

You first need to create a new file in NuGram IDE to edit the grammar. We’ll call it name.abnf. (The actual name and location of the file in your workspace doesn’t really matter as we will be able to choose a different name when publishing it on the grammar server.) The file should have the following content:

#ABNF 1.0 ISO-8859-1;

language en-US;
tag-format <semantics/1.0>;
root $name;

public $name =
  [$pre_filler] $directoryEntry [$post_filler]
  {out.extension = rules.directoryEntry.extension;}
;

$directoryEntry =
  @alt
      @for (entry : entries)
        ( [ @word employee.firstname ]
          @word employee.lastname
          @tag "out.extension = '" entry.extension "';" @end
        )
      @end
  @end
;

$post_filler = please;
$pre_filler  =  I would like to speak with  | can I talk to;

As you can see, this is mainly ABNF with some extensions for the dynamic parts of the grammar.

Step 2 - Publish your grammar

In the ABNF editor, press Alt-Ctrl-Shift-P or right-click in the editor and select the Publish menu item in the contextual menu. This will open a dialog box in which you enter the grammar name on NuGram Server. (Of course, you first need to configure the publishing feature appropriately in the Eclipse Preferences. You’ll need to specify the server address, which is http://www.grammarserver.com:8082, your user name, and password). Since this is an English grammar, we’ll call it en/name.abnf.

That’s it! We are now ready to write our VoiceXML application.

Step 3 - Add the grammar to your VoiceXML application

Dynamic grammars are instantiated by sending instantiation contexts to NuGram Server, together with the name of the grammar. An instantiation context is simply a set of key/value pairs encoded as a JSON object. The context is passed to NuGram Server using a very simple HTTP-based interface. In VoiceXML, we’ll use the data element for this. Once the dynamic grammar is instantiated, the URI of the generated grammar is returned to the VoiceXML application for use in a grammar element.

To simplify the application code, I wrote a few ECMAScript helper functions. You can get them here. They must be put in a file named gsapi.js in the same folder as the VoiceXML application itself. Note that some of these functions rely on global objects provided by the Voxeo VoiceXML interpreter.

Now let’s start writing the VoiceXML document. We must begin with the usual XML header and the root element and a script element to include the ECMAScript helper functions:

<?xml version="1.0"?>
<vxml xmlns="http://www.w3.org/2001/vxml"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/2001/vxml
                          http://www.w3.org/TR/voicexml21/vxml.xsd"
      version="2.1">

  <script src="gsapi.js"/>

The next step is to set up the connection with NuGram Server:

  <script>
    var grammarUri = null;
    setupGrammarServer('www.grammarserver.com:8082', 'UserName', 'Password');
  </script>

This only assigns values to a few of variables. No magic here. The interesting part follows. We must now create a session on NuGram Server and instantiate the dynamic grammar. We will do this inside a form element:

  <form id="start">
   <block>
    <script>
      initiateSessionCreation();
    </script>
    <data name="createSessionResponse" srcexpr="serverUrl()"
          method="post" namelist="account password operation resource"/>
    <script>
      setupSessionId(createSessionResponse);
    </script>

The first script element sets up a number of variables, while the second one extracts the session ID from the response to the data element.

The instantiation context is then sent to NuGram Server in the same way:

    <script><![CDATA[
      initiateInstantiation('en/name.abnf',
                            {"entries":[{"firstname":"dominique",
                                         "lastname":"boucher",
                                         "extension":"4231"},
                                        {"firstname":"yves",
                                         "lastname":"normandin",
                                         "extension":"4225"}]});

    ]]></script>
    <data name="createGrammarResponse" srcexpr="serverUrl()"
          method="post" namelist="account password operation resource context"/>
    <script>
      grammarUri = getGrammarUri(createGrammarResponse);
    </script>
    <goto next="#ask"/>
   </block>
  </form>

Of course, the context is hard-coded here. In a real application, it would probably be the result of a request to a database or a web service.

The initiateInstantiation function sets a few variables. In particular, the context variable is set to a JSON representation of the seconod argument to initiateInstantiation. (The Voxeo VoiceXML interpreter provides the JSON object, which can be used to serialize and deserialize JSON strings.)

The XML document returned by the data element will contain, upon successful completion, the URI of the generated grammar. The getGrammarUri function simply extracts this URI. We can now use this URI in a grammar element:

  <form id="ask">
    <field name="name">
      <prompt>Please say the name of the person you would like to reach.</prompt>
      <grammar srcexpr="grammarUrl(grammarUri)  "/>
      <filled>
       <prompt>
         The extension is
         <value expr="application.lastresult$.interpretation.extension"/>.
       </prompt>
       <goto next="#end"/>
      </filled>
      <catch event="connection.disconnect.hangup">
         <goto next="#end"/>
      </catch>
      <catch event=".">
        Sorry. I did not understand.
        <goto next="#end"/>
      </catch>
    </field>
  </form>

The final step is to release the session on NuGram Server:

  <form id="end">
    <block>
      <script>
       initiateSessionDestroy();
      </script>
      <data name="deleteSessionResponse" srcexpr="serverUrl()"
            method="post" namelist="account password operation resource"/>
      <prompt>Bye Bye!</prompt>
      <disconnect/>
    </block>
  </form>
</vxml>

This is it! Plain VoiceXML 2.1 compliant code, no web application to deploy! You are ready to test the application.

Advantages

The advantages of this approach are manifold. They are explained in more depth in our latest whitepaper, but let me summarize them:

  • No web server to deploy, which means shorter development times;
  • Dynamic grammars can be tested and debugged using the same, very sophisticated IDE used for static grammars;
  • Static grammars can seamlessly evolve to dynamic grammars without sacrificing debugging and tuning capabilities.
  • Generated grammars can be output in various formats (ABNF, GrXML, Nuance GSL). You thus have a technology that is engine-agnostic (NuGram IDE fully supports the most popular semantic interpretation tags, like SISR, Nuance OSR, and Nuance 8.5).

What do you think? Let us know! Our NuGram Beta Program is an opportunity for you to help us enhance our offering and make sure that your needs will be fulfilled.

September 1st, 2008 No Comments

by Yves Normandin

Why a grammar platform

On effective grammar tools

Why are there so many VoiceXML “Service Creation Environments” (also called “dialog designers” or “dialog builders”) available - some of them actually quite good - but no decent Grammar Development Environment? Over the past several years, we’ve often ask ourselves that question.

Indeed, we’ve always seen our grammar development tools not only as an essential component of our speech practice, but also as a key competitive differentiator. This is why we’ve invested so much effort constantly improving them based on the feedbacks of the most demanding grammar developers: Our own!

Judging from all the requests we got over the years regarding the availability of our grammar tools, it looks like a large number of people have also asked themselves the very same question.

One obvious reason, of course, is that you can’t make much money selling grammar tools, so why bother? Another, perhaps not-so-obvious reason is that it’s really not trivial to build tools that truly and effectively support the grammar development process. For instance, graphical grammar editing tools may at first glance appear appealing but they in practice just make grammars more cumbersome and difficult to manipulate without really addressing the most difficult challenges faced by grammar developers.

What grammar developers really need are tools that:

  • Really help accelerate the grammar authoring process - with an editor that provides all the advanced features developers should expect;
  • Can test grammar coverage and semantic interpretation correctness - to make sure grammars give the expected result (and that we don’t accidentally break them); and
  • Provide powerful grammar analysis, visualization, and debugging capabilities - to help pinpoint and fix problems in the grammar.

The dynamic grammar challenge

This, in fact, is what our grammar development tools have provided for a long time. There was, however, one important problem: We very often have to build applications that require grammars to be dynamically generated at run time based on input data. Although there are many ways of doing this, the bottom line was that we had a very sophisticated grammar development environment that we just couldn’t use for dynamic grammars. To us, this just made no sense. What’s the point of having great tools if you can only use them for half your grammars?

The fact that grammar development/tuning and dialog implementation require very different skills sets only made this situation worse. A great java developer is not necessarily a great grammar developer (and vice-versa). But the traditional approaches to dynamic grammars typically means that the grammar developer only ends up developing static grammars while the dynamic grammars have to be developed by whoever implements the application. Again, this makes no sense.

A complete grammar solution

Clearly, a complete grammar solution needs to effectively deal with all grammars - static and dynamic. This is why we created the NuGram Platform, whose key foundations are:

  1. The ABNF Template Language. This is essentially the ABNF format, as specified in the W3C Speech Recognition Grammar Specification, with the addition of dynamic grammar extensions, used to add dynamic content to grammars.
  2. NuGram IDE, an integrated grammar development environment that supports the development of static and dynamic grammars in a uniform and consistent way.
  3. A set of Grammar Services, used for instance to instantiate a grammar (based on a grammar template and an instantiation context), to generate the grammar in the required format (e.g., GrXML, GSL, ABNF), or to parse a text string using the grammar.
  4. NuGram Server, the dynamic grammar run-time component of the platform, designed to be easily integrated with any speech application or service creation environment.

Why do we believe this is a very significant step forward for speech application developers? For several reasons. For instance:

  • All grammars required by an application can now be developed, debugged, and tested using a unique, consistent development environment.
  • The ability to use a single grammar authoring language regardless of the target recognition engine eliminates the need to learn about new grammar tools when switching to a new ASR engine and makes grammars much more portable.
  • The ability to develop dynamic grammars in a way that is independent from any application runtime infrastructure also make them much more portable and reusable.

What’s the catch?

So, if this is so valuable, why do we make it available for free? Simply because we believe this will create business opportunities for Nu Echo. Frankly, we’re convinced that grammar developers that start using  NuGram IDE will just not want to go back to their old tools. If that’s the case, then, at some point, if they need outside help to develop or tune their grammars, we hope they’ll think of us.

Also, while the NuGram IDE Basic Edition will remain free, we plan to offer a Professional Edition with more advanced features. While many developers will undoubtedly be quite happy with the Basic Edition, we hope that some users will want to pay for the more advanced features. And, of course, there’s a catch: There will be a runtime license associated with NuGram Server for deploying dynamic grammars. Of course, if you don’t use dynamic grammars, then this has no impact. But we think you will at some point and, when that time comes, you’ll decide that you really want to use one of our runtime solutions.

Give us feedback

You can get yourself a free copy of NuGram IDE at http://www.grammarserver.com. Over the next several months, this blog will discuss a number of topics related to grammar development and the NuGram Platform. We certainly hope you’ll give us some feedback.