Don’t let NuGram choose session IDs for you

NuGram Session Viewer

NuGram Session Viewer

Just after hitting the “Publish” button last week for my previous post on NuGram and CouchDB, I realized the code I wrote had a serious flaw.

Remember it? Here’s an excerpt:

#...
post '/ask_street.json' do
  session = GrammarServer.new.create_session "username", "password"

  tropo_event = Tropo::Generator.parse request.env["rack.input"].read
  zipcode = tropo_event.result.actions.zip_code.value

  grammar = session.instantiate "streets.abnf",
                                :zipcode => database.get(zipcode)

  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/say_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'street',
          :bargein => 'true' }) do
      say :value => "What is your street address, beginning with your street number?"
      choices :value => grammar.get_url("grxml")
    end
  end
  tropo.response
end
#...

There are basically two problems with this code:

  1. On line 3, the application creates a new session on NuGram Hosted Server and the unique session ID is returned to the application. But there is no code to free the session. We cannot free the session on line 20 because the speech recognition engine will have to fetch a resource (the speech grammar) associated with the session.
  2. The other handlers of the application cannot reuse the same session, making it impossible to reuse resources (like grammars) from one request to the other.

Problem #1 is not a very serious one. The session will be automatically reclaimed by the server after 15 minutes of inactivity on the session.

But problem #2 is more serious, as it can lead to wasted resources on server side. If an application wants to ask the same question from two different handlers, using the same dynamic grammar, it can’t using the approach above. Two or more sessions will have to be created, and the grammar will have to be instantiated at least twice.

Providing a session ID explicitly

In fact, NuGram’s RESTful API was not the culprit. The Ruby API was simply missing a little something (I have since fixed it on github and the Ruby gem will soon be updated to reflect the change): a way to specify the session ID explicitly instead of letting NuGram forge a new one. This way, each handler can use a shared session ID when accessing NuGram.

(The Sinatra/Rack web framework provides a session management API to let the application store data that survives a single request. This would solve our problem too. But, as will be explained below, the explicit session ID is a superior solution.)

The updated Ruby API can now specify a session ID explicitly:

  session = GrammarServer.new.session "username", "password", sessionid

The question is now: what do we use as the session ID?

Every communication platform I know (VoiceXML platforms, cloud telephony platforms, Asterisk, etc.) associates a unique identifier with each call. This is usually referred to as the call ID or the session ID. This is certainly one of the best IDs for your NuGram session. Why? Because this greatly simplifies debugging your application. If a problem occurs, it becomes very easy to correlate the communication platform’s logs and NuGram’s logs. But it can also be the application’s session ID (the jsessionid in the case of a Java web application). The idea is to use an ID that uniquely identifies your session across all the servers: communication platform, application server, grammar server, etc.

For instance, Tropo provides a session ID with each response. Here is how to use it in our previous example:

#...
post '/ask_street.json' do
  tropo_event = Tropo::Generator.parse request.env["rack.input"].read

  zipcode = tropo_event.result.actions.zip_code.value

  sessionId = tropo_event.result.sessionId
  session = GrammarServer.new.session "username", "password", sessionId
  grammar = session.instantiate "streets.abnf",
                                :zipcode => database.get(zipcode)
  #...

Bridging NuGram and CouchDB

Mark Headd, a new member of the Voxeo family, published a blog post last week on how to build speech recognition applications with Tropo. Since his post covered things like SRGS grammars and dynamic grammars, I couldn’t resist. I had to enter the fray and show how the dynamic SRGS grammar could be built using NuGram Hosted Server. And while I’m at it, lets use CouchDB instead of an SQL database.

The Database

Mark’s example is a simple address capture dialog. It consists in asking for the zip code, and then asking for the civic number and street name/type. The grammar for the second question is built dynamically based on the entered zip code. All street names/types and their associated zip codes are stored in a SQL database and retrieved by some PHP code.

In my case, I decided to store all the the data in a CouchDB database called “zipcode”. (CouchDB is a nice RESTful, HTTP-based document-oriented database, where documents are stored as plain JSON strings.) Once CouchDB is up and running (I assume here it’s running on the local host, on port 5984, but that could be on any hosting service, like CouchOne), we simply create the database and populate it using the curl commmand-line tool:

% curl -X PUT http://localhost:5984/zipcode
{"ok":true}
% curl -X POST http://localhost:5984/zipcode/_bulk_docs \
       -H 'Content-Type: application/json' \
       -d "`cat zipcodes.json`"

where the file zipcodes.json contains the following data:

{"docs": [
{
    "_id": "18752",
    "type": "zipcode",
    "streets" : [
       {"name":"First", "type":"Avenue"},
       {"name":"Grant", "type":"Avenue"},
       {"name":"Josiah", "type":"Parkway"},
       {"name":"Murphy", "type":"Lane"},
       {"name":"Chery Blossom"," type":"Circle"}
    ]
},
{
    "_id": "19752",
    "type": "zipcode",
    "streets" : [
       {"name":"Milberry", "type":"Extension"},
       {"name":"Jones", "type":"Street"},
       {"name":"Martin Luther King", "type":"Boulevard"},
       {"name":"Halsey", "type":"Place"}
    ]
}
]}

Each document (whose ID is a zip code) contains an attribute streets that lists all street names/types for the given zip code. Here there a only a few streets for two zip codes.

(Of course there are other ways to model the data, but that’s the simplest I could think of.)

The grammar template

Instead of using some code to create the streets grammar dynamically, we create a grammar template that is pushed on www.grammarserver.com (NuGram Hosted Server) that will later be populated with data from the database and rendered in GrXML (or ABNF).

To do that, we just need to register an account (but don’t worry it’s absolutely free).

So here is the grammar template:

#ABNF 1.0 ISO-8859-1;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $streets;

public $streets =
    $civicNumber $name [$direction]
    {out = rules.civicNumber.number + "," + rules.name + "," + rules.direction}
;

$civicNumber =
    {out.number = ''} ($number {out.number += rules.number}) <1->
;

$name =
    @alt
        @for (street : zipcode.streets)
           (@word street.name @word street.type)
        @end
    @end
;

$number =
     (zero | oh) {out = "0"}
   | one   {out = "1"}
   | two   {out = "2"}
   | three {out = "3"}
   | four  {out = "4"}
   | five  {out = "5"}
   | six   {out = "6"}
   | seven {out = "7"}
   | eight {out = "8"}
   | nine  {out = "9"}
;

$direction =
     north (west {out = 'nw'} | east {out = 'ne'} )
   | south (west {out = 'sw'} | east {out = 'se'} )
;

As you can see, it’s plain ABNF, with the exception of some simple dynamic directives on lines 19-23. And it’s a bit more involved than Mark’s one. It contains semantic tags to better format the recognized utterance.

To publish the grammar, we use curl again:

% curl -X PUT http://www.grammarserver.com/api/grammar/streets.abnf \
       -u username:password \
       -d "`cat streets.abnf`"

We are now ready to write the application.

Connecting the dots

Now that the database is set up and the template published on NuGram Hosted Server, the only thing we need to do is create a simple app that bridges the two. For this, I decided to use Tropo’s web API, and more specifically the Ruby webapi gem (as well as the couchrest and nugramserver-api gems). The app mimics Mark’s one and all CouchDB and NuGram Hosted Server related lines are highlighted below:

require 'rubygems'
require 'sinatra'
require 'tropo-webapi-ruby'
require 'nugramserver-ruby'
require 'couchrest'

couch_server = CouchRest.new "http://localhost:5984"
database = couch_server.database "zipcode"

post '/start.json' do
  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/ask_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'zip_code',
          :bargein => 'true' }) do
      say     :value => "Say your 5 digit zip code"
      choices :value => "[5 DIGITS]"
    end
  end
  tropo.response
end

post '/ask_street.json' do
  session = GrammarServer.new.create_session "username", "password"

  tropo_event = Tropo::Generator.parse request.env["rack.input"].read
  zipcode = tropo_event.result.actions.zip_code.value

  grammar = session.instantiate "streets.abnf",
                                :zipcode => database.get(zipcode)

  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/say_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'street',
          :bargein => 'true' }) do
      say :value => "What is your street address, beginning with your street number?"
      choices :value => grammar.get_url("grxml")
    end
  end
  tropo.response
end

#...

The app is not complete, some handlers are missing. But you get the idea.

A final note

Of course, this post just covers the basics of integrating a dynamic grammar in a speech app. A real address capture application is certainly a bit more complex than that. For instance, given the large number of streets covered by a single zip code, it may not be desirable to generate grammars dynamically. They may have to be compiled in advance, with a periodic update process. Or you may want to implement some clever grammar caching strategies. Either way, you may instead consider the Java version of NuGram Server (not the hosted one).

What’s in your IVR application monitoring report?

In a recent discussion over Hacker News, someone came up with a request for an IVR application monitoring service, suggesting that this is something which should be rather easy to build. Indeed, the dialing is rather easy. A few hacks with Tropo, Twilio or some custom Asterisk scripts would do the trick, but keep in mind that such monitoring service should interact with the IVR the same way a user would (but that’s another story and an upcoming blog post!).

However, as I have pointed out myself, it is one thing to periodically call a given number, it is another to send daily, weekly, monthly and yearly reports to reflect the actual state of the IVR application over time.

Moreover, those reports needs to provide insightful and reliable information. That’s where Mirador comes handy.

Stability Metrics

Mirador Report - Stability Metrics

First and foremost, your report should give a quick overview of the overall stability of your IVR application over a given time period (daily, weekly, monthly, or yearly). Such metrics essentially provide the overall success rate of your application, where setup failures could be caused by various telephony/network errors such as timeout, busy or congestion, while transaction failures are errors occurring once the connection is established.

Performance Metrics

Mirador Report - Performance Metrics 1


Next, we have some performance metrics, which include average call duration, setup time, transaction duration and greeting delay for both all and successful calls. This raw data is also used to depict an interesting performance over time chart, where one can visually spot specific time periods.

While most data can be gathered quite simply, the greeting delay is totally different beast. It corresponds to the actual delay to get the initial application prompt following a successful call setup, as a user would feel it. To compute such data, we used a few interesting speech recognition tricks of ours :)

Mirador Report - Timing Distributions

How do you know whether a user is waiting 1s or 10s for your application to answer? Or, when a user is supposed to take 2 minutes to complete a given transaction or task, how do you know if that is really the case? Performance metrics would not be complete without some distribution charts to highlight such information. To get a better understanding of how well your IVR application responds to some peak periods in production, we have crafted two distribution charts which not only depict setup times but also transaction durations.

Alarm History

Any serious monitoring service should provide email or SMS notification whenever a defect occurs (otherwise, what’s the point of monitoring?). Mirador can be configured to act upon certain thresholds or specific criteria and send alarm notifications right away, in real-time. While alarm occurrence is one thing, alarm restore is another. Indeed, you not only want to know whenever a problem occurred but also the moment the situation has been acted upon and restored.

Mirador Report - Alarm History

That is why a good monitoring report should present a list of all the alarms for a given time period!

Call Detail Records

Mirador Report - Call Detail Records

Lastly, but not the least: the ability to review call detail records (CDRs). Especially those generating alarms. You might want to know when such calls occurred, what was their actual status, duration and so on. You might even be interested in listening to the complete call recordings while you are at it.

Conclusion

Reports are an integral part of any monitoring service. Plus, you certainly would like to review them within your email client, online in a secure location or as a PDF document, to share with your peers. Ideally, you would have a web dashboard where you could access report history, setup new monitoring configurations, reschedule a configuration, define alarm thresholds and notification targets, and so on.

Mirador - PDF Email Web

Mirador - PDF Email Web

Mirador IVR application monitoring service  features all of the previously mentioned characteristics, except for the dashboard. But we are working on it so stay tuned for more!

So, what’s in your IVR application monitoring report?

Ruby NuGram Server API now available as a gem

Jason Goecke from Voxeo LabsJason Goecke, VP Innovation at Voxeo Labs and one of two founders of Adhearsion, is at the RHoK #2 (Random Hacks of Kindness) in San Francisco this weekend. In an effort to further simplify the use of our NuGram Hosted Server API, he turned the Ruby API into a full-fledged Ruby gem. The code is available from github.

To install the Ruby API, just enter the following at the command prompt:

> gem install nugramserver-ruby

You can now start using the NuGram Hosted Server API by adding the following two lines of code to your Ruby application:

require "rubygems"
require "nugramserver-ruby"

And here is a complete example :

require "rubygems"
require "nugramserver-ruby"

# Definition of the grammar template
template = "#ABNF 1.0 ISO-8859-1;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $voicedialer;

public $voicedialer =
    [$politeness] $contacts [please]
;

private $contacts =
  @alt
    @for (contact : contacts)
      ([@word contact.firstname] @word contact.lastname
       @tag \"out.number  = '\" contact.number \"';\" @end)
    @end
  @end
;

private $politeness =
      ((I (would like | want) | I'd like) to (talk to | speak with))
    | (give me | gimme | get me)
;"

# Create a connection to the server
server = GrammarServer.new()

# Initiate a new session
session = server.create_session("username", "password")

# Upload the grammar template (this only needs to be done the first time)
session.upload("voicedialer.abnf", template)

# Push some data to instantiate the template.
grammar = session.instantiate("voicedialer.abnf",
                              {'contacts' =>
                                [{ 'firstname' => "John", 'lastname' => 'Doe',
                                   'number' => '1234' },
                                 { 'firstname' => "Bill", 'lastname' => 'Smith',
                                   'number' => '4321' }]})

# Retrieve the URL of the resulting grammar in GrXML form
puts "grammar url = ", grammar.get_url('grxml')

# Retrieve the content of the resulting grammar in ABNF form
puts grammar.get_content("abnf")

# Terminate the session
session.disconnect

Happy hacking! (And many thanks to Jason for contributing this gem!)

Related posts:

Grammar tips & tricks #3 – Use of global tags

Tip #3: In SRGS grammars, use global semantic interpretation (SI) tags to simplify SI tags in rule expansions.

It is fairly common in SRGS grammars to put some form of computation in semantics tags. For example, checksum algorithms (like the Luhn algorithm) are commonly used in credit card number grammars.

When grammars contain those kind of computations, it is good practice to use global SI tags to put functions and constants definitions. These SI tags are declared before the definition of the first rule, as part of the other grammar headers. They must be followed by a semicolon. In ABNF form, this looks like:

#ABNF 1.0;

mode voice;
root $rootRule;

{
  // header tag
};

$rootRule =
 ...
;
...

The use of global tags has several advantages:

  • Functions are more easily testable. The functions declared in the global tags can be developed and tested outside of the grammar file (using a JavaScript interpreter like SpiderMonkey, Rhino, or V8), and later copied into the global tag.
  • It avoids code duplication. The use of functions usually reduces code duplication, which lowers the risk of fixing a problem at one place only and missing one.
  • Semantic interpretation is less CPU-intensive. Another side-effect of using functions is that SI tags usually get smaller, thus reducing the time taken to parse them and interpreting them, leading to faster interpretation and better response time. (Semantic interpretation tags are usually executed after the last word has been uttered so it’s sometimes important to optimize them.)

What about GrXML?

In the XML form, you simply put tag elements before the first rule element. But you don’t really need to know that, right? NuGram IDE can convert ABNF grammars to their XML counterpart so easily!

A concrete example

Let’s illustrate this by considering a simple grammar for a 12-digit account number using the Luhn algorithm to validate the number. Here is a first version of the grammar:

#ABNF 1.0 UTF-8;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $accountNumber;

public $accountNumber =
    { out.number = ""; var checksum = 0; }
    ( $digit {!{
                 out.number += rules.digit;
                 var digit = parseInt(rules.digit);
                 var doubledigit = digit * 2;
                 if (doubledigit > 9)
                    checksum += (doubledigit % 10) + 1;
                 else
                 	checksum += doubledigit;
              }!}
      $digit {
                 out.number += rules.digit;
                 checksum += parseInt(rules.digit);
             }) <6>
    { out.valid = (checksum % 10) == 0;}
;

private $digit =
    one    {out = "1"} | two         {out = "2"}
  | three  {out = "3"} | four        {out = "4"}
  | five   {out = "5"} | six         {out = "6"}
  | seven  {out = "7"} | eight       {out = "8"}
  | nine   {out = "9"} | (zero | oh) {out = "0"}
;

The code to calculate the checksum is mixed with the rule references to collect the digits. This makes the grammar look much more complex than it really is. And its performance is much worse than it could be.

If we move the checksum computation in a header tag, we obtain the following grammar:

#ABNF 1.0 UTF-8;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $accountNumber;

{!{
function luhnCheck(digits) {
  var checksum = 0;
  for (var i = 0; i<12; i++) {
    var digit = parseInt(digits.charAt(i));
    if (i % 2 == 0) {
      var doubledigit = digit * 2;
      if (doubledigit > 9)
         checksum += (doubledigit % 10) + 1;
      else
      	checksum += doubledigit;
	}
	else
	  checksum += digit;
  }
  return (checksum % 10) == 0;
}
}!};

public $accountNumber =
    { out.number = "";}
    ( $digit { out.number += rules.digit; }) <12>
    { out.valid = luhnCheck(out.number); }
;

private $digit =
    one    {out = "1"} | two         {out = "2"}
  | three  {out = "3"} | four        {out = "4"}
  | five   {out = "5"} | six         {out = "6"}
  | seven  {out = "7"} | eight       {out = "8"}
  | nine   {out = "9"} | (zero | oh) {out = "0"}
;

Now the accountNumber rule is much simpler and it is clear that it only accepts 12 digits. Moreover, the validation function can be tested independently. If the code is copied to a file named checksum.js, I can launch the SpiderMonkey interpreter and test the function like this:

[tmp] js
js> load("checksum.js")
js> luhnCheck("123456789012")
false
js> luhnCheck("123456789015")
true
js> ^D
[tmp]

In fact, these test cases can be put in the source file along with the code. But you get the idea.

Global scope is read-only

Beware, when writing your SI tags, that the global scope is read-only for SI tags, while it is mutable for all global tags. That means a variable cannot be declared in a global SI tag and modified in a normal SI tag. For example, the following grammar

#ABNF 1.0;

mode voice;
root $rootRule;

{
  var globalVar = 1;
};

$rootRule =
  { globalVar = 2; } some words { out = globalVar; }
;

would raise an exception when “some words” is uttered. That’s because the first SI tag on line 11 tries to modify a read-only variable (globalVar).

There is of course a way to bypass this limitation. Simply declare a global variable, say GLOBAL that holds an object whose properties will represent the variables you would have liked to be global. To illustrate, here is how the previous grammar would be modified:

#ABNF 1.0;

mode voice;
root $rootRule;

{
  var GLOBAL = new Object();
  GLOBAL.globalVar = 1;
};

$rootRule =
  { GLOBAL.globalVar = 2; } some words { out = GLOBAL.globalVar; }
;

This time, the grammar will return 2 when “some words” is uttered.

It should be noted that the IBM engine, which supports an old version of the SISR specification, does allow global variables to be modified in SI tags. It is very important to be aware of that when converting grammars initially written for the IBM engine to another engine supporting the latest SISR specification (like, for instance, Loquendo or Nuance 9).