Category Archives: Development

Loquendo and Nu Echo announce partnership to expand PS offering

Loquendo and Nu Echo are proud to announce today the launch of a new partnership to provide a complete offering for the North American market, from speech technologies to professional services. This new partnership enables the companies to offer North American customers speech application development, dialog design, porting and tuning services to complement Loquendo’s existing portfolio of highest quality multilingual technologies, for a simpler yet more effective and enhanced integration of speech.

Read the full press release.

Testing dynamic grammars

In my post on NuGram and CouchDB, I neglected to mention how the dynamic grammar was authored and, most importantly, tested. Having a repeatable process for testing grammars is very important when developing a speech application, as most grammars change and get more complex over time.

Of course, the grammar was authored with NuGram IDE. NuGram IDE has some great features to test grammars, and especially dynamic grammars. Dynamic grammars (like the streets grammar) have always been more difficult to debug than static grammars. They can be very easy to write for small applications or prototypes (or blog posts…), but in real applications their coverage tests are often (and should!) run in batch as part of an automated build process. But this is often too cumbersome in practice. For instance, a dynamic grammar implemented as a JSP page requires a web application server to run and if the JSP page makes queries to a database, the DB must be running somewhere too. This greatly complicates the setup to make batch coverage tests. Moreover, writing and testing the dynamic grammar requires some programming skills that speech scientists don’t always have (at least not in large organizations).

With NuGram’s template language, a dynamic grammar can be tested in NuGram IDE Basic Edition in two different ways:

  • Using predefined data encoded as a JSON object (a JSON context), or
  • Using some custom Java code (a Java context).

Both ways require the creation of an instantiation context. It’s simply a mapping between variable names and values. An instantiation context must provide a value for each and every variable used in the grammar template. The values are used to populate the template and produce the resulting (ABNF or XML) grammar. The way the instantiation context is created depends on the type of context. For a JSON context, the instantiation context is the JSON document itself. For the Java context, some Java code populates a map from strings to objects.

The following video shows how to create a JSON context for the street grammar:

This one shows the steps required to create and use a Java context:

Note: there was a subtle (uncovered) bug in the previous version of NuGram IDE. If you want to create Java contexts like in the video above, please make sure to download the latest version.

The whole project used in the videos is available on github. The Java context initializers use the following open-source libraries:

In the next post, I will show how to use the Java context initializer to deploy the streets grammar on the Java-based version of NuGram Server.

And you, how do you test your dynamic grammars?

Don’t let NuGram choose session IDs for you

NuGram Session Viewer

NuGram Session Viewer

Just after hitting the “Publish” button last week for my previous post on NuGram and CouchDB, I realized the code I wrote had a serious flaw.

Remember it? Here’s an excerpt:

#...
post '/ask_street.json' do
  session = GrammarServer.new.create_session "username", "password"

  tropo_event = Tropo::Generator.parse request.env["rack.input"].read
  zipcode = tropo_event.result.actions.zip_code.value

  grammar = session.instantiate "streets.abnf",
                                :zipcode => database.get(zipcode)

  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/say_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'street',
          :bargein => 'true' }) do
      say :value => "What is your street address, beginning with your street number?"
      choices :value => grammar.get_url("grxml")
    end
  end
  tropo.response
end
#...

There are basically two problems with this code:

  1. On line 3, the application creates a new session on NuGram Hosted Server and the unique session ID is returned to the application. But there is no code to free the session. We cannot free the session on line 20 because the speech recognition engine will have to fetch a resource (the speech grammar) associated with the session.
  2. The other handlers of the application cannot reuse the same session, making it impossible to reuse resources (like grammars) from one request to the other.

Problem #1 is not a very serious one. The session will be automatically reclaimed by the server after 15 minutes of inactivity on the session.

But problem #2 is more serious, as it can lead to wasted resources on server side. If an application wants to ask the same question from two different handlers, using the same dynamic grammar, it can’t using the approach above. Two or more sessions will have to be created, and the grammar will have to be instantiated at least twice.

Providing a session ID explicitly

In fact, NuGram’s RESTful API was not the culprit. The Ruby API was simply missing a little something (I have since fixed it on github and the Ruby gem will soon be updated to reflect the change): a way to specify the session ID explicitly instead of letting NuGram forge a new one. This way, each handler can use a shared session ID when accessing NuGram.

(The Sinatra/Rack web framework provides a session management API to let the application store data that survives a single request. This would solve our problem too. But, as will be explained below, the explicit session ID is a superior solution.)

The updated Ruby API can now specify a session ID explicitly:

  session = GrammarServer.new.session "username", "password", sessionid

The question is now: what do we use as the session ID?

Every communication platform I know (VoiceXML platforms, cloud telephony platforms, Asterisk, etc.) associates a unique identifier with each call. This is usually referred to as the call ID or the session ID. This is certainly one of the best IDs for your NuGram session. Why? Because this greatly simplifies debugging your application. If a problem occurs, it becomes very easy to correlate the communication platform’s logs and NuGram’s logs. But it can also be the application’s session ID (the jsessionid in the case of a Java web application). The idea is to use an ID that uniquely identifies your session across all the servers: communication platform, application server, grammar server, etc.

For instance, Tropo provides a session ID with each response. Here is how to use it in our previous example:

#...
post '/ask_street.json' do
  tropo_event = Tropo::Generator.parse request.env["rack.input"].read

  zipcode = tropo_event.result.actions.zip_code.value

  sessionId = tropo_event.result.sessionId
  session = GrammarServer.new.session "username", "password", sessionId
  grammar = session.instantiate "streets.abnf",
                                :zipcode => database.get(zipcode)
  #...

Ruby NuGram Server API now available as a gem

Jason Goecke from Voxeo LabsJason Goecke, VP Innovation at Voxeo Labs and one of two founders of Adhearsion, is at the RHoK #2 (Random Hacks of Kindness) in San Francisco this weekend. In an effort to further simplify the use of our NuGram Hosted Server API, he turned the Ruby API into a full-fledged Ruby gem. The code is available from github.

To install the Ruby API, just enter the following at the command prompt:

> gem install nugramserver-ruby

You can now start using the NuGram Hosted Server API by adding the following two lines of code to your Ruby application:

require "rubygems"
require "nugramserver-ruby"

And here is a complete example :

require "rubygems"
require "nugramserver-ruby"

# Definition of the grammar template
template = "#ABNF 1.0 ISO-8859-1;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $voicedialer;

public $voicedialer =
    [$politeness] $contacts [please]
;

private $contacts =
  @alt
    @for (contact : contacts)
      ([@word contact.firstname] @word contact.lastname
       @tag \"out.number  = '\" contact.number \"';\" @end)
    @end
  @end
;

private $politeness =
      ((I (would like | want) | I'd like) to (talk to | speak with))
    | (give me | gimme | get me)
;"

# Create a connection to the server
server = GrammarServer.new()

# Initiate a new session
session = server.create_session("username", "password")

# Upload the grammar template (this only needs to be done the first time)
session.upload("voicedialer.abnf", template)

# Push some data to instantiate the template.
grammar = session.instantiate("voicedialer.abnf",
                              {'contacts' =>
                                [{ 'firstname' => "John", 'lastname' => 'Doe',
                                   'number' => '1234' },
                                 { 'firstname' => "Bill", 'lastname' => 'Smith',
                                   'number' => '4321' }]})

# Retrieve the URL of the resulting grammar in GrXML form
puts "grammar url = ", grammar.get_url('grxml')

# Retrieve the content of the resulting grammar in ABNF form
puts grammar.get_content("abnf")

# Terminate the session
session.disconnect

Happy hacking! (And many thanks to Jason for contributing this gem!)

Related posts:

Grammar tips & tricks #3 – Use of global tags

Tip #3: In SRGS grammars, use global semantic interpretation (SI) tags to simplify SI tags in rule expansions.

It is fairly common in SRGS grammars to put some form of computation in semantics tags. For example, checksum algorithms (like the Luhn algorithm) are commonly used in credit card number grammars.

When grammars contain those kind of computations, it is good practice to use global SI tags to put functions and constants definitions. These SI tags are declared before the definition of the first rule, as part of the other grammar headers. They must be followed by a semicolon. In ABNF form, this looks like:

#ABNF 1.0;

mode voice;
root $rootRule;

{
  // header tag
};

$rootRule =
 ...
;
...

The use of global tags has several advantages:

  • Functions are more easily testable. The functions declared in the global tags can be developed and tested outside of the grammar file (using a JavaScript interpreter like SpiderMonkey, Rhino, or V8), and later copied into the global tag.
  • It avoids code duplication. The use of functions usually reduces code duplication, which lowers the risk of fixing a problem at one place only and missing one.
  • Semantic interpretation is less CPU-intensive. Another side-effect of using functions is that SI tags usually get smaller, thus reducing the time taken to parse them and interpreting them, leading to faster interpretation and better response time. (Semantic interpretation tags are usually executed after the last word has been uttered so it’s sometimes important to optimize them.)

What about GrXML?

In the XML form, you simply put tag elements before the first rule element. But you don’t really need to know that, right? NuGram IDE can convert ABNF grammars to their XML counterpart so easily!

A concrete example

Let’s illustrate this by considering a simple grammar for a 12-digit account number using the Luhn algorithm to validate the number. Here is a first version of the grammar:

#ABNF 1.0 UTF-8;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $accountNumber;

public $accountNumber =
    { out.number = ""; var checksum = 0; }
    ( $digit {!{
                 out.number += rules.digit;
                 var digit = parseInt(rules.digit);
                 var doubledigit = digit * 2;
                 if (doubledigit > 9)
                    checksum += (doubledigit % 10) + 1;
                 else
                 	checksum += doubledigit;
              }!}
      $digit {
                 out.number += rules.digit;
                 checksum += parseInt(rules.digit);
             }) <6>
    { out.valid = (checksum % 10) == 0;}
;

private $digit =
    one    {out = "1"} | two         {out = "2"}
  | three  {out = "3"} | four        {out = "4"}
  | five   {out = "5"} | six         {out = "6"}
  | seven  {out = "7"} | eight       {out = "8"}
  | nine   {out = "9"} | (zero | oh) {out = "0"}
;

The code to calculate the checksum is mixed with the rule references to collect the digits. This makes the grammar look much more complex than it really is. And its performance is much worse than it could be.

If we move the checksum computation in a header tag, we obtain the following grammar:

#ABNF 1.0 UTF-8;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $accountNumber;

{!{
function luhnCheck(digits) {
  var checksum = 0;
  for (var i = 0; i<12; i++) {
    var digit = parseInt(digits.charAt(i));
    if (i % 2 == 0) {
      var doubledigit = digit * 2;
      if (doubledigit > 9)
         checksum += (doubledigit % 10) + 1;
      else
      	checksum += doubledigit;
	}
	else
	  checksum += digit;
  }
  return (checksum % 10) == 0;
}
}!};

public $accountNumber =
    { out.number = "";}
    ( $digit { out.number += rules.digit; }) <12>
    { out.valid = luhnCheck(out.number); }
;

private $digit =
    one    {out = "1"} | two         {out = "2"}
  | three  {out = "3"} | four        {out = "4"}
  | five   {out = "5"} | six         {out = "6"}
  | seven  {out = "7"} | eight       {out = "8"}
  | nine   {out = "9"} | (zero | oh) {out = "0"}
;

Now the accountNumber rule is much simpler and it is clear that it only accepts 12 digits. Moreover, the validation function can be tested independently. If the code is copied to a file named checksum.js, I can launch the SpiderMonkey interpreter and test the function like this:

[tmp] js
js> load("checksum.js")
js> luhnCheck("123456789012")
false
js> luhnCheck("123456789015")
true
js> ^D
[tmp]

In fact, these test cases can be put in the source file along with the code. But you get the idea.

Global scope is read-only

Beware, when writing your SI tags, that the global scope is read-only for SI tags, while it is mutable for all global tags. That means a variable cannot be declared in a global SI tag and modified in a normal SI tag. For example, the following grammar

#ABNF 1.0;

mode voice;
root $rootRule;

{
  var globalVar = 1;
};

$rootRule =
  { globalVar = 2; } some words { out = globalVar; }
;

would raise an exception when “some words” is uttered. That’s because the first SI tag on line 11 tries to modify a read-only variable (globalVar).

There is of course a way to bypass this limitation. Simply declare a global variable, say GLOBAL that holds an object whose properties will represent the variables you would have liked to be global. To illustrate, here is how the previous grammar would be modified:

#ABNF 1.0;

mode voice;
root $rootRule;

{
  var GLOBAL = new Object();
  GLOBAL.globalVar = 1;
};

$rootRule =
  { GLOBAL.globalVar = 2; } some words { out = GLOBAL.globalVar; }
;

This time, the grammar will return 2 when “some words” is uttered.

It should be noted that the IBM engine, which supports an old version of the SISR specification, does allow global variables to be modified in SI tags. It is very important to be aware of that when converting grammars initially written for the IBM engine to another engine supporting the latest SISR specification (like, for instance, Loquendo or Nuance 9).