Category Archives: NuGram

Don’t let NuGram choose session IDs for you

NuGram Session Viewer

NuGram Session Viewer

Just after hitting the “Publish” button last week for my previous post on NuGram and CouchDB, I realized the code I wrote had a serious flaw.

Remember it? Here’s an excerpt:

#...
post '/ask_street.json' do
  session = GrammarServer.new.create_session "username", "password"

  tropo_event = Tropo::Generator.parse request.env["rack.input"].read
  zipcode = tropo_event.result.actions.zip_code.value

  grammar = session.instantiate "streets.abnf",
                                :zipcode => database.get(zipcode)

  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/say_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'street',
          :bargein => 'true' }) do
      say :value => "What is your street address, beginning with your street number?"
      choices :value => grammar.get_url("grxml")
    end
  end
  tropo.response
end
#...

There are basically two problems with this code:

  1. On line 3, the application creates a new session on NuGram Hosted Server and the unique session ID is returned to the application. But there is no code to free the session. We cannot free the session on line 20 because the speech recognition engine will have to fetch a resource (the speech grammar) associated with the session.
  2. The other handlers of the application cannot reuse the same session, making it impossible to reuse resources (like grammars) from one request to the other.

Problem #1 is not a very serious one. The session will be automatically reclaimed by the server after 15 minutes of inactivity on the session.

But problem #2 is more serious, as it can lead to wasted resources on server side. If an application wants to ask the same question from two different handlers, using the same dynamic grammar, it can’t using the approach above. Two or more sessions will have to be created, and the grammar will have to be instantiated at least twice.

Providing a session ID explicitly

In fact, NuGram’s RESTful API was not the culprit. The Ruby API was simply missing a little something (I have since fixed it on github and the Ruby gem will soon be updated to reflect the change): a way to specify the session ID explicitly instead of letting NuGram forge a new one. This way, each handler can use a shared session ID when accessing NuGram.

(The Sinatra/Rack web framework provides a session management API to let the application store data that survives a single request. This would solve our problem too. But, as will be explained below, the explicit session ID is a superior solution.)

The updated Ruby API can now specify a session ID explicitly:

  session = GrammarServer.new.session "username", "password", sessionid

The question is now: what do we use as the session ID?

Every communication platform I know (VoiceXML platforms, cloud telephony platforms, Asterisk, etc.) associates a unique identifier with each call. This is usually referred to as the call ID or the session ID. This is certainly one of the best IDs for your NuGram session. Why? Because this greatly simplifies debugging your application. If a problem occurs, it becomes very easy to correlate the communication platform’s logs and NuGram’s logs. But it can also be the application’s session ID (the jsessionid in the case of a Java web application). The idea is to use an ID that uniquely identifies your session across all the servers: communication platform, application server, grammar server, etc.

For instance, Tropo provides a session ID with each response. Here is how to use it in our previous example:

#...
post '/ask_street.json' do
  tropo_event = Tropo::Generator.parse request.env["rack.input"].read

  zipcode = tropo_event.result.actions.zip_code.value

  sessionId = tropo_event.result.sessionId
  session = GrammarServer.new.session "username", "password", sessionId
  grammar = session.instantiate "streets.abnf",
                                :zipcode => database.get(zipcode)
  #...

Bridging NuGram and CouchDB

Mark Headd, a new member of the Voxeo family, published a blog post last week on how to build speech recognition applications with Tropo. Since his post covered things like SRGS grammars and dynamic grammars, I couldn’t resist. I had to enter the fray and show how the dynamic SRGS grammar could be built using NuGram Hosted Server. And while I’m at it, lets use CouchDB instead of an SQL database.

The Database

Mark’s example is a simple address capture dialog. It consists in asking for the zip code, and then asking for the civic number and street name/type. The grammar for the second question is built dynamically based on the entered zip code. All street names/types and their associated zip codes are stored in a SQL database and retrieved by some PHP code.

In my case, I decided to store all the the data in a CouchDB database called “zipcode”. (CouchDB is a nice RESTful, HTTP-based document-oriented database, where documents are stored as plain JSON strings.) Once CouchDB is up and running (I assume here it’s running on the local host, on port 5984, but that could be on any hosting service, like CouchOne), we simply create the database and populate it using the curl commmand-line tool:

% curl -X PUT http://localhost:5984/zipcode
{"ok":true}
% curl -X POST http://localhost:5984/zipcode/_bulk_docs \
       -H 'Content-Type: application/json' \
       -d "`cat zipcodes.json`"

where the file zipcodes.json contains the following data:

{"docs": [
{
    "_id": "18752",
    "type": "zipcode",
    "streets" : [
       {"name":"First", "type":"Avenue"},
       {"name":"Grant", "type":"Avenue"},
       {"name":"Josiah", "type":"Parkway"},
       {"name":"Murphy", "type":"Lane"},
       {"name":"Chery Blossom"," type":"Circle"}
    ]
},
{
    "_id": "19752",
    "type": "zipcode",
    "streets" : [
       {"name":"Milberry", "type":"Extension"},
       {"name":"Jones", "type":"Street"},
       {"name":"Martin Luther King", "type":"Boulevard"},
       {"name":"Halsey", "type":"Place"}
    ]
}
]}

Each document (whose ID is a zip code) contains an attribute streets that lists all street names/types for the given zip code. Here there a only a few streets for two zip codes.

(Of course there are other ways to model the data, but that’s the simplest I could think of.)

The grammar template

Instead of using some code to create the streets grammar dynamically, we create a grammar template that is pushed on www.grammarserver.com (NuGram Hosted Server) that will later be populated with data from the database and rendered in GrXML (or ABNF).

To do that, we just need to register an account (but don’t worry it’s absolutely free).

So here is the grammar template:

#ABNF 1.0 ISO-8859-1;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $streets;

public $streets =
    $civicNumber $name [$direction]
    {out = rules.civicNumber.number + "," + rules.name + "," + rules.direction}
;

$civicNumber =
    {out.number = ''} ($number {out.number += rules.number}) <1->
;

$name =
    @alt
        @for (street : zipcode.streets)
           (@word street.name @word street.type)
        @end
    @end
;

$number =
     (zero | oh) {out = "0"}
   | one   {out = "1"}
   | two   {out = "2"}
   | three {out = "3"}
   | four  {out = "4"}
   | five  {out = "5"}
   | six   {out = "6"}
   | seven {out = "7"}
   | eight {out = "8"}
   | nine  {out = "9"}
;

$direction =
     north (west {out = 'nw'} | east {out = 'ne'} )
   | south (west {out = 'sw'} | east {out = 'se'} )
;

As you can see, it’s plain ABNF, with the exception of some simple dynamic directives on lines 19-23. And it’s a bit more involved than Mark’s one. It contains semantic tags to better format the recognized utterance.

To publish the grammar, we use curl again:

% curl -X PUT http://www.grammarserver.com/api/grammar/streets.abnf \
       -u username:password \
       -d "`cat streets.abnf`"

We are now ready to write the application.

Connecting the dots

Now that the database is set up and the template published on NuGram Hosted Server, the only thing we need to do is create a simple app that bridges the two. For this, I decided to use Tropo’s web API, and more specifically the Ruby webapi gem (as well as the couchrest and nugramserver-api gems). The app mimics Mark’s one and all CouchDB and NuGram Hosted Server related lines are highlighted below:

require 'rubygems'
require 'sinatra'
require 'tropo-webapi-ruby'
require 'nugramserver-ruby'
require 'couchrest'

couch_server = CouchRest.new "http://localhost:5984"
database = couch_server.database "zipcode"

post '/start.json' do
  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/ask_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'zip_code',
          :bargein => 'true' }) do
      say     :value => "Say your 5 digit zip code"
      choices :value => "[5 DIGITS]"
    end
  end
  tropo.response
end

post '/ask_street.json' do
  session = GrammarServer.new.create_session "username", "password"

  tropo_event = Tropo::Generator.parse request.env["rack.input"].read
  zipcode = tropo_event.result.actions.zip_code.value

  grammar = session.instantiate "streets.abnf",
                                :zipcode => database.get(zipcode)

  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/say_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'street',
          :bargein => 'true' }) do
      say :value => "What is your street address, beginning with your street number?"
      choices :value => grammar.get_url("grxml")
    end
  end
  tropo.response
end

#...

The app is not complete, some handlers are missing. But you get the idea.

A final note

Of course, this post just covers the basics of integrating a dynamic grammar in a speech app. A real address capture application is certainly a bit more complex than that. For instance, given the large number of streets covered by a single zip code, it may not be desirable to generate grammars dynamically. They may have to be compiled in advance, with a periodic update process. Or you may want to implement some clever grammar caching strategies. Either way, you may instead consider the Java version of NuGram Server (not the hosted one).

Ruby NuGram Server API now available as a gem

Jason Goecke from Voxeo LabsJason Goecke, VP Innovation at Voxeo Labs and one of two founders of Adhearsion, is at the RHoK #2 (Random Hacks of Kindness) in San Francisco this weekend. In an effort to further simplify the use of our NuGram Hosted Server API, he turned the Ruby API into a full-fledged Ruby gem. The code is available from github.

To install the Ruby API, just enter the following at the command prompt:

> gem install nugramserver-ruby

You can now start using the NuGram Hosted Server API by adding the following two lines of code to your Ruby application:

require "rubygems"
require "nugramserver-ruby"

And here is a complete example :

require "rubygems"
require "nugramserver-ruby"

# Definition of the grammar template
template = "#ABNF 1.0 ISO-8859-1;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $voicedialer;

public $voicedialer =
    [$politeness] $contacts [please]
;

private $contacts =
  @alt
    @for (contact : contacts)
      ([@word contact.firstname] @word contact.lastname
       @tag \"out.number  = '\" contact.number \"';\" @end)
    @end
  @end
;

private $politeness =
      ((I (would like | want) | I'd like) to (talk to | speak with))
    | (give me | gimme | get me)
;"

# Create a connection to the server
server = GrammarServer.new()

# Initiate a new session
session = server.create_session("username", "password")

# Upload the grammar template (this only needs to be done the first time)
session.upload("voicedialer.abnf", template)

# Push some data to instantiate the template.
grammar = session.instantiate("voicedialer.abnf",
                              {'contacts' =>
                                [{ 'firstname' => "John", 'lastname' => 'Doe',
                                   'number' => '1234' },
                                 { 'firstname' => "Bill", 'lastname' => 'Smith',
                                   'number' => '4321' }]})

# Retrieve the URL of the resulting grammar in GrXML form
puts "grammar url = ", grammar.get_url('grxml')

# Retrieve the content of the resulting grammar in ABNF form
puts grammar.get_content("abnf")

# Terminate the session
session.disconnect

Happy hacking! (And many thanks to Jason for contributing this gem!)

Related posts:

A proven yet simple grammar conversion process

Grammar Conversion Process

As old speech recognition engines are being replaced by newer ones, we see more and more organizations having to convert their old grammars to standard formats. Given the right process and set of tools, converting grammars from one engine to another should be a straightforward task with mostly no risk of breaking the associated IVR application.

The issues

There are several issues associated with the conversion of grammars:

  • Syntax. First, there is the syntax of the grammar itself. If we are converting a grammar bewteen two engines that support GrXML or ABNF, then there’s not much else to say. But if we are converting from Nuance GSL to GrXML or ABNF, that’s a different story. GSL has very different operator precedences than ABNF, for instance. We have to be careful.
  • Semantics. The second issue is the language used inside the semantic tags. Again, if both engines support SISR, we have nothing to do. But if we convert from GSL to ABNF+SISR, we may have a harder time. For example, SISR does not support the concept of a top-level slot that can be assigned from anywhere in the grammar (using the <slot value> syntax).
  • Pronunciation lexicons. Almost all speech engines use a different format for lexicons. Not to mention that even different versions of the same engine sometimes support different phonetic alphabets.

A proven process

If you follow a rigorous process, the first two issues above can be easily mitigated. Here is one that has proven very effective:

  1. A coverage test set is produced from the original grammar. The test set should ideally ensure that all semantic tags are executed at least once (this is not always sufficient if the semantic tags contain conditional code, but that’s a good starting point.)
  2. The grammar is converted to the new format.
  3. The converted grammar is tested against the coverage test set of the original grammar (and problems are fixed, if any, until all tests pass).

Some tools

Some ASR engines already provide tools to convert grammars from old proprietary formats to the new standard ones. For instance, Nuance ships a tool to automatically convert GSL grammars to GrXML + SISR. It does not support all features of GSL as some of them have no equivalence in GrXML and SISR. And one of the problems with this converter is that the semantic tags produced are not easily maintainable.

NuGram IDE also provides some tools to help with the above process. In particular, it offers:

  • Great support for creating and running coverage tests.
  • A sophisticated sentence generation tool. The tags coverage strategy, for instance, is very effective when converting grammars as it helps generating sentences that will cover all semantic tags.
  • Support for all major semantic tags formats (GSL, Nuance OSR extensions, IBM and Microsoft, etc.).

Of course, to use NuGram effectively, your grammars will need to be converted to ABNF first. No problem! NuGram provides GSL and GrXML to ABNF converters to help you, as well as converters from ABNF to GSL or GrXML. That means all you have to worry about is really the conversion of the semantic tags. In this case, the whole process now becomes:

  1. Grammars are first imported in ABNF.
  2. A coverage test set is produced from the original grammar.
  3. Semantic tags are converted.
  4. The converted grammars are checked for errors by running the coverage tests of the original grammars. In case of errors, they are fixed and all tests are re-run.
  5. Convert the grammars to the desired target format.

What about pronunciation lexicons?

Unfortunately, converting phonetic dictionaries is still a manual and error-prone process, for which there are no good solutions as of this writing. And this task is more part of the tuning process that follows the grammar conversion process anyway. In most cases, a grammar’s pronunciation lexicon is used to fix incorrect or missing pronunciations in the ASR engine’s own dictionary for very specific words. The phonetic dictionary of the target ASR engine may not have the same limitations or deficiencies. At best, the original grammar’s pronunciation lexicon can act as an inspiration for the creation of the new pronunciation lexicon.

Get two NuGram IDE Pro licenses free when you purchase a grammar development course

Learn how to systematically deliver high-quality, high performance grammars by fully leveraging the features and tools available in NuGram IDE. Supported by hands-on exercises and numerous examples, Effective Grammar Development with NuGram IDE provides a breadth of knowledge, best practices, and tips and tricks that have shown their effectiveness at addressing the main challenges of grammar development and at delivering better grammars faster.

And if you order our on-site grammar development course before October 31st, you will get two licenses of NuGram IDE Professional Edition entirely free! There is only one catch: course must be given before December 31st, 2010. Contact us for details.