Tag Archives: couchdb

CouchDB for call analysis data – a case study

At Nu Echo, we’ve been developing and refining our own VoiceXML application framework for years now. As part of our nth rewrite (and I’ll talk more about that rewrite and why we did it in another post), we decided to experiment with CouchDB. (For those new to CouchDB, it’s a schema-less document-oriented database. A so-called NoSQL database.)

The first area where we saw a fit for CouchDB was the storage of call analysis data. This data consists of various attributes associated with a call, information about each interaction (like recognition results) and each transaction (groups of interactions). It can also be augmented with the recordings saved by the ASR engine. Call analysis data is used by our call viewer tool to listen to calls, search for calls exhibiting some specific caller behaviors, produce reports, etc.

In the previous incarnation of our framework, call analysis data was stored on disk in a plain text file, and optionally in a SQL database. Due to the richness of our model, the SQL schema consisted of about 15 tables. And the representation of the same data in the text file was quite complex (tab-separated values, with some fields encoded in JSON format). At the end of each call, data collected during the call was stored on disk and optionally stored in the SQL database. We also had a script that could read all the files on disk and push the data in the SQL database at a later time.

Adding support for CouchDB

The very first step toward our support of CouchDB consisted in rewriting the serialization code to produce JSON-encoded call analysis data instead of our complicated text format.  Now, data for  a call is written as a single JSON object, one per line, prefixed by the call Id. This greatly simplified the code to read data back into memory.

The next step was to write a script to push the data to CouchDB. The script simply reads the call data, one call per line, and PUTs them to CouchDB in batches of 100 calls using the bulk API in order to increase performance.

Finally, we had to rewrite the part of our call viewer tool connecting to a database to retrieve calls data matching some patterns. It relies on some simple CouchDB views, but not that much in order to be as independent as possible of the database layer (it is possible to retrieve calls from text files as well from the call viewer).

Benefits

We obtained several benefits by moving to CouchDB:

  1. Performance – Loading call analysis data in the CouchDB database is way faster than putting the same data in a MySQL database. Our preliminary results show a speed up factor of about 100 (this does not take the loading of audio recordings into account, though). Ok, we are comparing apples and oranges. CouchDB does not update the view indexes until they are requested, while MySQL updates its indexes as rows are inserted. And only a single document is inserted in CouchDB, compared to lots of rows in more than 15 tables in SQL. On the other hand, if insertions are done at application runtime (after the completion of the call), you better do it fast, especially if the IVR handles many hundred (if not thousand) ports.
  2. Evolution – Making modifications to a complex schema is painful, especially when you have applications deployed in the field. As documents do not have to follow a rigid schema, it is much easier to adapt our code to multiple versions.
  3. Attachments - Even if audio recordings can be stored in a traditional SQL database as blobs, a custom application is still required to access them. With CouchDB, recordings are stored as attachments to the JSON document for the corresponding call. Moreover, these recordings are easily accessible by other tools since CouchDB is itself a webserver and all documents and attachments have a URL.

Conclusion

Of course, there is no panacea and CouchDB is no exception. There are still some aspects of our system for which CouchDB does not provide a better solution than an SQL database. One of them is the support for custom queries. In the call viewer tool, it was possible to write custom SQL queries to find calls matching very specific criteria. Of course, CouchDB supports temporary views to do something equivalent. The main problem is the time taken to build the view. When hundreds of thousands or even millions of calls are processed, creating a temporary view can take a long time (several minutes).  Not so good for an interactive tool.

But overall, we have been very pleased by the performance of CouchDB and the flexibility it gives us.

Testing dynamic grammars

In my post on NuGram and CouchDB, I neglected to mention how the dynamic grammar was authored and, most importantly, tested. Having a repeatable process for testing grammars is very important when developing a speech application, as most grammars change and get more complex over time.

Of course, the grammar was authored with NuGram IDE. NuGram IDE has some great features to test grammars, and especially dynamic grammars. Dynamic grammars (like the streets grammar) have always been more difficult to debug than static grammars. They can be very easy to write for small applications or prototypes (or blog posts…), but in real applications their coverage tests are often (and should!) run in batch as part of an automated build process. But this is often too cumbersome in practice. For instance, a dynamic grammar implemented as a JSP page requires a web application server to run and if the JSP page makes queries to a database, the DB must be running somewhere too. This greatly complicates the setup to make batch coverage tests. Moreover, writing and testing the dynamic grammar requires some programming skills that speech scientists don’t always have (at least not in large organizations).

With NuGram’s template language, a dynamic grammar can be tested in NuGram IDE Basic Edition in two different ways:

  • Using predefined data encoded as a JSON object (a JSON context), or
  • Using some custom Java code (a Java context).

Both ways require the creation of an instantiation context. It’s simply a mapping between variable names and values. An instantiation context must provide a value for each and every variable used in the grammar template. The values are used to populate the template and produce the resulting (ABNF or XML) grammar. The way the instantiation context is created depends on the type of context. For a JSON context, the instantiation context is the JSON document itself. For the Java context, some Java code populates a map from strings to objects.

The following video shows how to create a JSON context for the street grammar:

This one shows the steps required to create and use a Java context:

Note: there was a subtle (uncovered) bug in the previous version of NuGram IDE. If you want to create Java contexts like in the video above, please make sure to download the latest version.

The whole project used in the videos is available on github. The Java context initializers use the following open-source libraries:

In the next post, I will show how to use the Java context initializer to deploy the streets grammar on the Java-based version of NuGram Server.

And you, how do you test your dynamic grammars?

Bridging NuGram and CouchDB

Mark Headd, a new member of the Voxeo family, published a blog post last week on how to build speech recognition applications with Tropo. Since his post covered things like SRGS grammars and dynamic grammars, I couldn’t resist. I had to enter the fray and show how the dynamic SRGS grammar could be built using NuGram Hosted Server. And while I’m at it, lets use CouchDB instead of an SQL database.

The Database

Mark’s example is a simple address capture dialog. It consists in asking for the zip code, and then asking for the civic number and street name/type. The grammar for the second question is built dynamically based on the entered zip code. All street names/types and their associated zip codes are stored in a SQL database and retrieved by some PHP code.

In my case, I decided to store all the the data in a CouchDB database called “zipcode”. (CouchDB is a nice RESTful, HTTP-based document-oriented database, where documents are stored as plain JSON strings.) Once CouchDB is up and running (I assume here it’s running on the local host, on port 5984, but that could be on any hosting service, like CouchOne), we simply create the database and populate it using the curl commmand-line tool:

% curl -X PUT http://localhost:5984/zipcode
{"ok":true}
% curl -X POST http://localhost:5984/zipcode/_bulk_docs \
       -H 'Content-Type: application/json' \
       -d "`cat zipcodes.json`"

where the file zipcodes.json contains the following data:

{"docs": [
{
    "_id": "18752",
    "type": "zipcode",
    "streets" : [
       {"name":"First", "type":"Avenue"},
       {"name":"Grant", "type":"Avenue"},
       {"name":"Josiah", "type":"Parkway"},
       {"name":"Murphy", "type":"Lane"},
       {"name":"Chery Blossom"," type":"Circle"}
    ]
},
{
    "_id": "19752",
    "type": "zipcode",
    "streets" : [
       {"name":"Milberry", "type":"Extension"},
       {"name":"Jones", "type":"Street"},
       {"name":"Martin Luther King", "type":"Boulevard"},
       {"name":"Halsey", "type":"Place"}
    ]
}
]}

Each document (whose ID is a zip code) contains an attribute streets that lists all street names/types for the given zip code. Here there a only a few streets for two zip codes.

(Of course there are other ways to model the data, but that’s the simplest I could think of.)

The grammar template

Instead of using some code to create the streets grammar dynamically, we create a grammar template that is pushed on www.grammarserver.com (NuGram Hosted Server) that will later be populated with data from the database and rendered in GrXML (or ABNF).

To do that, we just need to register an account (but don’t worry it’s absolutely free).

So here is the grammar template:

#ABNF 1.0 ISO-8859-1;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $streets;

public $streets =
    $civicNumber $name [$direction]
    {out = rules.civicNumber.number + "," + rules.name + "," + rules.direction}
;

$civicNumber =
    {out.number = ''} ($number {out.number += rules.number}) <1->
;

$name =
    @alt
        @for (street : zipcode.streets)
           (@word street.name @word street.type)
        @end
    @end
;

$number =
     (zero | oh) {out = "0"}
   | one   {out = "1"}
   | two   {out = "2"}
   | three {out = "3"}
   | four  {out = "4"}
   | five  {out = "5"}
   | six   {out = "6"}
   | seven {out = "7"}
   | eight {out = "8"}
   | nine  {out = "9"}
;

$direction =
     north (west {out = 'nw'} | east {out = 'ne'} )
   | south (west {out = 'sw'} | east {out = 'se'} )
;

As you can see, it’s plain ABNF, with the exception of some simple dynamic directives on lines 19-23. And it’s a bit more involved than Mark’s one. It contains semantic tags to better format the recognized utterance.

To publish the grammar, we use curl again:

% curl -X PUT http://www.grammarserver.com/api/grammar/streets.abnf \
       -u username:password \
       -d "`cat streets.abnf`"

We are now ready to write the application.

Connecting the dots

Now that the database is set up and the template published on NuGram Hosted Server, the only thing we need to do is create a simple app that bridges the two. For this, I decided to use Tropo’s web API, and more specifically the Ruby webapi gem (as well as the couchrest and nugramserver-api gems). The app mimics Mark’s one and all CouchDB and NuGram Hosted Server related lines are highlighted below:

require 'rubygems'
require 'sinatra'
require 'tropo-webapi-ruby'
require 'nugramserver-ruby'
require 'couchrest'

couch_server = CouchRest.new "http://localhost:5984"
database = couch_server.database "zipcode"

post '/start.json' do
  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/ask_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'zip_code',
          :bargein => 'true' }) do
      say     :value => "Say your 5 digit zip code"
      choices :value => "[5 DIGITS]"
    end
  end
  tropo.response
end

post '/ask_street.json' do
  session = GrammarServer.new.create_session "username", "password"

  tropo_event = Tropo::Generator.parse request.env["rack.input"].read
  zipcode = tropo_event.result.actions.zip_code.value

  grammar = session.instantiate "streets.abnf",
                                :zipcode => database.get(zipcode)

  tropo = Tropo::Generator.new do
    on :event => 'continue', :next => '/say_street.json'
    on :event => 'hangup', :next => '/hangup.json'
    ask({ :name => 'street',
          :bargein => 'true' }) do
      say :value => "What is your street address, beginning with your street number?"
      choices :value => grammar.get_url("grxml")
    end
  end
  tropo.response
end

#...

The app is not complete, some handlers are missing. But you get the idea.

A final note

Of course, this post just covers the basics of integrating a dynamic grammar in a speech app. A real address capture application is certainly a bit more complex than that. For instance, given the large number of streets covered by a single zip code, it may not be desirable to generate grammars dynamically. They may have to be compiled in advance, with a periodic update process. Or you may want to implement some clever grammar caching strategies. Either way, you may instead consider the Java version of NuGram Server (not the hosted one).