Archive for September, 2008

September 25th, 2008 2 Comments

by Dominique Boucher

4 (not so good) reasons to author grammars in XML

At SpeechTEK University this summer, Judi Halperin from Avaya and Jenni McKienzie from Travelocity gave a very good introduction to grammar writing. The slides are definitely worth reading. They did a good job at addressing the most common sources of problems with speech recognition grammars.

However, two things struck me in their presentation: (1) They use the SRGS XML Form as the authoring language for speech recognition grammars, and (2) They mention JSP or ASP pages as the most common way of dynamically generating grammars. I’ll keep the latter point for another post, but let me address the first point here.

Having long ago abandoned the XML Form in favor of ABNF in our own practice, we’re always intrigued by the fact that a large number of grammar developers - including expert developers like Judi and Jenni - continue using the XML Form (in the case of Judi and Jenni’s presentation, I can see that for a teaching situation with time constraints they would choose GRXML for the examples since more people are familiar with that format and those that aren’t can read it easily, their choice was certainly a conscious decision). Indeed, there is just no question in our mind that ABNF, being so much more compact, readable, and easier to manipulate than the XML Form, is by far the better choice.

I therefore tried to put my feet in the shoes of those developers using the XML Form and understand their motivations. So here’s what I came up with:

  1. XML is the native format for the ASR engine. It’s true that some ASR engines - Nuance’s OSR and Nuance 9 in particular - only support the XML Form. It’s also true that support for the SRGS XML format is required by the specification, while support for ABNF is only optional. But there are format converters out there, so even on these platforms, the ABNF format can be used to author the grammar.
  2. It’s painful having to convert from ABNF to XML all the time. That’s a good point. Many testing tools provided with ASR engines (e.g., parseTool) will require you to convert the grammar to the XML form, which can indeed be painful. This is especially true if conversion tools are not well integrated with the environment in which grammars are being edited.
  3. XML is the format for all documents in the project. I heard this a few times. Some hard-core developers like XML. But that implies that the VUI designer, the speech scientist, or whoever authors the grammars, actually is a software developer. Quite often, that’s not the case.
  4. There is no good ABNF editor. I think this is the crux of the problem. Kind of a chicken and egg situation. No one uses ABNF because there is no good editor and no one provides a good ABNF editor because there is no demand for it. At least, with a decent XML editor, you get syntax coloring, code assist based on the document schema, etc. Unfortunately, an XML editor doesn’t know anything about grammars and therefore cannot provide advanced features like syntax checking of semantic tags, or refactoring capabilities (expansion extraction, rule renaming, semantic slot renaming, etc.).

However valid these points might have been at some point, now that there is a complete environment for developing, testing, and debugging recognition grammars in ABNF format (and exporting them to any target ASR engine), I don’t think there is now any remaining reason for not switching to ABNF. Like, immediately.

Am I missing something? Are there other more fundamental reasons I did not see? Let me know!

I am deeply convinced that once you try authoring your grammars in ABNF using NuGram IDE, you won’t want to get back to your old habits of coding grammars in the XML Form. Give it a try! It’s free. And, by the way, remember that more and more speech recognition engines support ABNF natively.

September 14th, 2008 No Comments

by Dominique Boucher

The best time to migrate to NuGram IDE is NOW

You are at the start of a new VoiceXML project. Or you’ve just completed a project and you are slowly entering maintenance mode. Better yet, you’re in the middle of a large project involving speech recognition grammars. Whatever situation you’re in, now is the best time to migrate to NuGram IDE. You may find that this is one of the best moves you’ve done in a long time. Here is why:

  1. It’s easy. If you haven’t already done so, downloading and installing NuGram IDE takes only a few minutes. Then, converting existing grammars to ABNF (assuming that you don’t already uses the ABNF format) is a matter of seconds. On a .grxml file in the Navigator view, simply right-click on the resource to open the contextual menu, and select “Grammar Tools > Convert to ABNF“. It’s as simple as that. You’re using GSL grammars? Don’t despair! The next release, due real soon, will provide a GSL to ABNF converter.
  2. You’ll increase productivity. Yes, installing NuGram IDE and converting grammars will cost you a few minutes of your time. But you will rapidly recover this investment many times over through increased productivity:
    • NuGram IDE provides many powerful tools to help you edit, debug and maintain your grammars in the same environment as your preferred Eclipse-based service creation environment, be it VoiceObjects Desktop, Cisco CVP Studio, etc.
    • NuGram IDE provides a “builder” that automatically converts ABNF grammars to the format of your choice as soon as you save them. No need to manually convert each grammar one at a time.
  3. You’ll increase quality. NuGram IDE was designed to maximize grammar quality by:
    • Helping you find grammar problems quickly and fix them easily. For instance, the grammar editor instantaneously flags syntax errors with meaningful diagnostic information and the coverage tool enables you to make sure that the grammar hasn’t been accidentally broken.
    • Providing powerful transformation and refactoring tools that always preserve the integrity of the grammar, therefore avoiding tedious and error-prone manipulations. This directly results from the fact that all NuGram IDE tools truly understand the underlying grammar structure since they work on an abstract representation level, not on the textual level.
  4. It’s free. We provide the beta version completely free of charge. And once we reach GA, the Basic Edition will remain free. You just need to register to be able to download new versions of NuGram IDE and be notified of new releases.
  5. There’s no risk. You don’t like using NuGram IDE ? Easy. Just export the grammars to your preferred format and go back to using your old tools. But frankly, we don’t believe you’ll ever want to do that.

So why wait? Register and download NuGram IDE now! Start using it and give us feedback. Help us provide you with the best tools ever for grammar development.

September 5th, 2008 No Comments

by Dominique Boucher

Archived Jam Session

For those of you who missed yesterday’s VoiceObjects Jam Session, it has been archived and can now be fetched from VoiceObjects’ developer site.  In short, Tobias Goebel did a fantastic job explaining the various features offered by NuGram IDE. Check it out!

There are many free hosted VoiceXML platforms out there to try out new ideas, prototype applications, etc. I use one of them on a regular basis. Unfortunately, each time I need dynamically generated grammars in my application, I’m stuck. I have to roll my own solution (typically by launching a Web server on my machine, opening a temporary port in our firewall …). Ouch!

All of this is no longer necessary, thanks to our new NuGram Hosted Server, which we launched two weeks ago at SpeechTEK. In this post, I will show how to add dynamic grammars to a standard, VoiceXML 2.1 compliant application. You won’t need to install or deploy any Web server technology. All you’ll need is:

  • Eclipse 3.2 or higher with NuGram IDE installed;
  • an account on grammarserver.com;
  • an account on Evolution Developer Portal to deploy and test the VoiceXML application. (You can use any VoiceXML 2.1 platform, of course, but the example uses some non-standard objects exposed in ECMAScript by the Evolution VoiceXML interpreter.)

The sample application

I will illustrate the whole process of adding dynamic grammars to a VoiceXML application by developing a very simple-minded voice-activated auto-attendant-like application. The application will simply ask for a name and tell you the associated extension number.

Step 1 - Edit your grammar

You first need to create a new file in NuGram IDE to edit the grammar. We’ll call it name.abnf. (The actual name and location of the file in your workspace doesn’t really matter as we will be able to choose a different name when publishing it on the grammar server.) The file should have the following content:

#ABNF 1.0 ISO-8859-1;

language en-US;
tag-format <semantics/1.0>;
root $name;

public $name =
  [$pre_filler] $directoryEntry [$post_filler]
  {out.extension = rules.directoryEntry.extension;}
;

$directoryEntry =
  @alt
      @for (entry : entries)
        ( [ @word employee.firstname ]
          @word employee.lastname
          @tag "out.extension = '" entry.extension "';" @end
        )
      @end
  @end
;

$post_filler = please;
$pre_filler  =  I would like to speak with  | can I talk to;

As you can see, this is mainly ABNF with some extensions for the dynamic parts of the grammar.

Step 2 - Publish your grammar

In the ABNF editor, press Alt-Ctrl-Shift-P or right-click in the editor and select the Publish menu item in the contextual menu. This will open a dialog box in which you enter the grammar name on NuGram Server. (Of course, you first need to configure the publishing feature appropriately in the Eclipse Preferences. You’ll need to specify the server address, which is http://www.grammarserver.com:8082, your user name, and password). Since this is an English grammar, we’ll call it en/name.abnf.

That’s it! We are now ready to write our VoiceXML application.

Step 3 - Add the grammar to your VoiceXML application

Dynamic grammars are instantiated by sending instantiation contexts to NuGram Server, together with the name of the grammar. An instantiation context is simply a set of key/value pairs encoded as a JSON object. The context is passed to NuGram Server using a very simple HTTP-based interface. In VoiceXML, we’ll use the data element for this. Once the dynamic grammar is instantiated, the URI of the generated grammar is returned to the VoiceXML application for use in a grammar element.

To simplify the application code, I wrote a few ECMAScript helper functions. You can get them here. They must be put in a file named gsapi.js in the same folder as the VoiceXML application itself. Note that some of these functions rely on global objects provided by the Voxeo VoiceXML interpreter.

Now let’s start writing the VoiceXML document. We must begin with the usual XML header and the root element and a script element to include the ECMAScript helper functions:

<?xml version="1.0"?>
<vxml xmlns="http://www.w3.org/2001/vxml"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/2001/vxml
                          http://www.w3.org/TR/voicexml21/vxml.xsd"
      version="2.1">

  <script src="gsapi.js"/>

The next step is to set up the connection with NuGram Server:

  <script>
    var grammarUri = null;
    setupGrammarServer('www.grammarserver.com:8082', 'UserName', 'Password');
  </script>

This only assigns values to a few of variables. No magic here. The interesting part follows. We must now create a session on NuGram Server and instantiate the dynamic grammar. We will do this inside a form element:

  <form id="start">
   <block>
    <script>
      initiateSessionCreation();
    </script>
    <data name="createSessionResponse" srcexpr="serverUrl()"
          method="post" namelist="account password operation resource"/>
    <script>
      setupSessionId(createSessionResponse);
    </script>

The first script element sets up a number of variables, while the second one extracts the session ID from the response to the data element.

The instantiation context is then sent to NuGram Server in the same way:

    <script><![CDATA[
      initiateInstantiation('en/name.abnf',
                            {"entries":[{"firstname":"dominique",
                                         "lastname":"boucher",
                                         "extension":"4231"},
                                        {"firstname":"yves",
                                         "lastname":"normandin",
                                         "extension":"4225"}]});

    ]]></script>
    <data name="createGrammarResponse" srcexpr="serverUrl()"
          method="post" namelist="account password operation resource context"/>
    <script>
      grammarUri = getGrammarUri(createGrammarResponse);
    </script>
    <goto next="#ask"/>
   </block>
  </form>

Of course, the context is hard-coded here. In a real application, it would probably be the result of a request to a database or a web service.

The initiateInstantiation function sets a few variables. In particular, the context variable is set to a JSON representation of the seconod argument to initiateInstantiation. (The Voxeo VoiceXML interpreter provides the JSON object, which can be used to serialize and deserialize JSON strings.)

The XML document returned by the data element will contain, upon successful completion, the URI of the generated grammar. The getGrammarUri function simply extracts this URI. We can now use this URI in a grammar element:

  <form id="ask">
    <field name="name">
      <prompt>Please say the name of the person you would like to reach.</prompt>
      <grammar srcexpr="grammarUrl(grammarUri)  "/>
      <filled>
       <prompt>
         The extension is
         <value expr="application.lastresult$.interpretation.extension"/>.
       </prompt>
       <goto next="#end"/>
      </filled>
      <catch event="connection.disconnect.hangup">
         <goto next="#end"/>
      </catch>
      <catch event=".">
        Sorry. I did not understand.
        <goto next="#end"/>
      </catch>
    </field>
  </form>

The final step is to release the session on NuGram Server:

  <form id="end">
    <block>
      <script>
       initiateSessionDestroy();
      </script>
      <data name="deleteSessionResponse" srcexpr="serverUrl()"
            method="post" namelist="account password operation resource"/>
      <prompt>Bye Bye!</prompt>
      <disconnect/>
    </block>
  </form>
</vxml>

This is it! Plain VoiceXML 2.1 compliant code, no web application to deploy! You are ready to test the application.

Advantages

The advantages of this approach are manifold. They are explained in more depth in our latest whitepaper, but let me summarize them:

  • No web server to deploy, which means shorter development times;
  • Dynamic grammars can be tested and debugged using the same, very sophisticated IDE used for static grammars;
  • Static grammars can seamlessly evolve to dynamic grammars without sacrificing debugging and tuning capabilities.
  • Generated grammars can be output in various formats (ABNF, GrXML, Nuance GSL). You thus have a technology that is engine-agnostic (NuGram IDE fully supports the most popular semantic interpretation tags, like SISR, Nuance OSR, and Nuance 8.5).

What do you think? Let us know! Our NuGram Beta Program is an opportunity for you to help us enhance our offering and make sure that your needs will be fulfilled.

September 2nd, 2008 No Comments

by Dominique Boucher

VoiceObjects Jam Session on NuGram IDE

I know this is short notice, but VoiceObjects (one of our partners) will hold a Jam Session entitled “Effective Grammar Development & Testing - Using a Comprehensive IDE“, tomorrow at 11AM EDT (8AM PDT). They will showcase NuGram IDE and explain the benefits of using a complete, highly integrated development environment for high-performance voice recognition grammars.