Bridging NuGram and CouchDB

Mark Headd, a new member of the Voxeo family, published a blog post last week on how to build speech recognition applications with Tropo. Since his post covered things like SRGS grammars and dynamic grammars, I couldn’t resist. I had to enter the fray and show how the dynamic SRGS grammar could be built using NuGram Hosted Server. And while I’m at it, lets use CouchDB instead of an SQL database.

The Database

Mark’s example is a simple address capture dialog. It consists in asking for the zip code, and then asking for the civic number and street name/type. The grammar for the second question is built dynamically based on the entered zip code. All street names/types and their associated zip codes are stored in a SQL database and retrieved by some PHP code.

In my case, I decided to store all the the data in a CouchDB database called “zipcode”. (CouchDB is a nice RESTful, HTTP-based document-oriented database, where documents are stored as plain JSON strings.) Once CouchDB is up and running (I assume here it’s running on the local host, on port 5984, but that could be on any hosting service, like CouchOne), we simply create the database and populate it using the curl commmand-line tool:

% curl -X PUT http://localhost:5984/zipcode
{"ok":true}
% curl -X POST http://localhost:5984/zipcode/_bulk_docs \
       -H 'Content-Type: application/json' \
       -d "`cat zipcodes.json`"

where the file zipcodes.json contains the following data:

{"docs": [
{
    "_id": "18752",
    "type": "zipcode",
    "streets" : [
       {"name":"First", "type":"Avenue"},
       {"name":"Grant", "type":"Avenue"},
       {"name":"Josiah", "type":"Parkway"},
       {"name":"Murphy", "type":"Lane"},
       {"name":"Chery Blossom"," type":"Circle"}
    ]
},
{
    "_id": "19752",
    "type": "zipcode",
    "streets" : [
       {"name":"Milberry", "type":"Extension"},
       {"name":"Jones", "type":"Street"},
       {"name":"Martin Luther King", "type":"Boulevard"},
       {"name":"Halsey", "type":"Place"}
    ]
}
]}

Each document (whose ID is a zip code) contains an attribute streets that lists all street names/types for the given zip code. Here there a only a few streets for two zip codes.

(Of course there are other ways to model the data, but that’s the simplest I could think of.)

The grammar template

Instead of using some code to create the streets grammar dynamically, we create a grammar template that is pushed on www.grammarserver.com (NuGram Hosted Server) that will later be populated with data from the database and rendered in GrXML (or ABNF).

To do that, we just need to register an account (but don’t worry it’s absolutely free).

So here is the grammar template:

#ABNF 1.0 ISO-8859-1;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $streets;

public $streets =
    $civicNumber $name [$direction]
    {out = rules.civicNumber.number + "," + rules.name + "," + rules.direction}
;

$civicNumber =
    {out.number = ''} ($number {out.number += rules.number}) <1->
;

$name =
    @alt
        @for (street : zipcode.streets)
           (@word street.name @word street.type)
        @end
    @end
;

$number =
     (zero | oh) {out = "0"}
   | one   {out = "1"}
   | two   {out = "2"}
   | three {out = "3"}
   | four  {out = "4"}
   | five  {out = "5"}
   | six   {out = "6"}
   | seven {out = "7"}
   | eight {out = "8"}
   | nine  {out = "9"}
;

$direction =
     north (west {out = 'nw'} | east {out = 'ne'} )
   | south (west {out = 'sw'} | east {out = 'se'} )
;

As you can see, it’s plain ABNF, with the exception of some simple dynamic directives on lines 19-23. And it’s a bit more involved than Mark’s one. It contains semantic tags to better format the recognized utterance.

To publish the grammar, we use curl again:

% curl -X PUT http://www.grammarserver.com/api/grammar/streets.abnf \
       -u username:password \
       -d "`cat streets.abnf`"

We are now ready to write the application.

Connecting the dots

Now that the database is set up and the template published on NuGram Hosted Server, the only thing we need to do is create a simple app that bridges the two. For this, I decided to use Tropo’s web API, and more specifically the Ruby webapi gem (as well as the couchrest and nugramserver-api gems). The app mimics Mark’s one and all CouchDB and NuGram Hosted Server related lines are highlighted below:

require 'rubygems'
require 'sinatra'
require 'tropo-webapi-ruby'
require 'nugramserver-ruby'
require 'couchrest'

couch_server = CouchRest.new "http://localhost:5984"
database = couch_server.database "zipcode"

post '/start.json' do
  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/ask_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'zip_code',
          :bargein => 'true' }) do
      say     :value => "Say your 5 digit zip code"
      choices :value => "[5 DIGITS]"
    end
  end
  tropo.response
end

post '/ask_street.json' do
  session = GrammarServer.new.create_session "username", "password"

  tropo_event = Tropo::Generator.parse request.env["rack.input"].read
  zipcode = tropo_event.result.actions.zip_code.value

  grammar = session.instantiate "streets.abnf",
                                :zipcode => database.get(zipcode)

  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/say_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'street',
          :bargein => 'true' }) do
      say :value => "What is your street address, beginning with your street number?"
      choices :value => grammar.get_url("grxml")
    end
  end
  tropo.response
end

#...

The app is not complete, some handlers are missing. But you get the idea.

A final note

Of course, this post just covers the basics of integrating a dynamic grammar in a speech app. A real address capture application is certainly a bit more complex than that. For instance, given the large number of streets covered by a single zip code, it may not be desirable to generate grammars dynamically. They may have to be compiled in advance, with a periodic update process. Or you may want to implement some clever grammar caching strategies. Either way, you may instead consider the Java version of NuGram Server (not the hosted one).

4 Comments

  • 1
    January 5, 2011 - 8:37 am | Permalink
  • 2
    January 5, 2011 - 9:22 am | Permalink
  • 3
    January 5, 2011 - 11:15 am | Permalink
  • 4
    January 5, 2011 - 11:36 am | Permalink
  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>