Session timeouts in NuGram Hosted Server

(This post has nothing to do with speech technologies or IVR applications. It’s merely a discussion on an implementation detail I described at the Erlang Montreal meetup and it’s rather technical.)

In my previous post about my talk at the Erlang Montreal meetup, slide 15 briefly outlines how session timeouts are implemented in NuGram Hosted Server.  The code is duplicated here:

receive
…
after Timeout ->
    db:expire_session(self())
end

This code uses the Erlang receive..after construct to handle timeouts. The construct tries to extract a message from the process mailbox, and waits at most Timeout milliseconds if there are no matching messages (variables start with an uppercase letter in Erlang).

This is great when sessions are represented using plain Erlang processes (I described this technique here). But there is a much better way to achieve the same effect when implementing servers using OTP’s gen_server behaviour. (One of our hard learned lessons is to take time to properly learn OTP, Erlang’s Open Telecommunications Platform, before building a production-grade system. It’s definitely worth the investment. It’s what puts Erlang in a totally different category than most programming languages and systems.)

When implementing a server using gen_server, one has to implement a few callback functions (namely handle_call for synchronous calls, handle_cast for asynchronous ones, and handle_info for other messages). In order to specify request timeouts, values returned by those three functions must provide the optional timeout:

handle_call(Request, From, State) ->
    Reply = ...
    {ok, Reply, NewState, Timeout};
...

If the server does not receive any message during the next Timeout milliseconds, the timeout message is sent to the process and must be handled by the handle_info function. To stop the process, something like the following can be done:

handle_info(timeout, State) ->
  %% Do some clean up
  {stop, normal, State};
...

This simply tells the server is to be shut down normally and that its last state is State (a great thing to know when things go wrong).

Slides from my talk at the Erlang Montreal meetup

Last week at the first Erlang Montreal meetup, I gave a talk on what we’ve learned at Nu Echo developing the NuGram Hosted Server in Erlang. I just put the slides from the presentation on SlideShare. Here they are:

Using NuBot with the Tropo Scripting API

Ever wondered how to instrument an existing application for use with the NuBot IVR Testing Platform? My colleague Pascal wrote a helper function in Groovy for easy instrumentation of applications built using the Tropo Scripting API.

The trick is to define a closure encapsulating the playing of DTMF sequences (these sequences are required in order to synchronize the IVR application with the NuBot test scenario):

def sequencer = { sequence, closure ->
    if (dtmfSequencerEnabled) {
        for (dtmf in sequence) {
            switch (dtmf) {
                case "*":say("${baseAudioUrl}/dtmf/star.wav");;
                case "#":say("${baseAudioUrl}/dtmf/pound.wav");
                default:say("${baseAudioUrl}/dtmf/${dtmf.toLowerCase()}.wav");
            }
        }
    }

    if (closure) return closure()
}

Using this definition, one can instrument an application very easily:

sequencer("a") {
    say("Hello. Thank you for calling the Travel Agency Customer Satisfaction Department")
};

The code, as well as a complete NuBot project and a few instrumented Tropo examples, is on github.

CouchDB for call analysis data – a case study

At Nu Echo, we’ve been developing and refining our own VoiceXML application framework for years now. As part of our nth rewrite (and I’ll talk more about that rewrite and why we did it in another post), we decided to experiment with CouchDB. (For those new to CouchDB, it’s a schema-less document-oriented database. A so-called NoSQL database.)

The first area where we saw a fit for CouchDB was the storage of call analysis data. This data consists of various attributes associated with a call, information about each interaction (like recognition results) and each transaction (groups of interactions). It can also be augmented with the recordings saved by the ASR engine. Call analysis data is used by our call viewer tool to listen to calls, search for calls exhibiting some specific caller behaviors, produce reports, etc.

In the previous incarnation of our framework, call analysis data was stored on disk in a plain text file, and optionally in a SQL database. Due to the richness of our model, the SQL schema consisted of about 15 tables. And the representation of the same data in the text file was quite complex (tab-separated values, with some fields encoded in JSON format). At the end of each call, data collected during the call was stored on disk and optionally stored in the SQL database. We also had a script that could read all the files on disk and push the data in the SQL database at a later time.

Adding support for CouchDB

The very first step toward our support of CouchDB consisted in rewriting the serialization code to produce JSON-encoded call analysis data instead of our complicated text format.  Now, data for  a call is written as a single JSON object, one per line, prefixed by the call Id. This greatly simplified the code to read data back into memory.

The next step was to write a script to push the data to CouchDB. The script simply reads the call data, one call per line, and PUTs them to CouchDB in batches of 100 calls using the bulk API in order to increase performance.

Finally, we had to rewrite the part of our call viewer tool connecting to a database to retrieve calls data matching some patterns. It relies on some simple CouchDB views, but not that much in order to be as independent as possible of the database layer (it is possible to retrieve calls from text files as well from the call viewer).

Benefits

We obtained several benefits by moving to CouchDB:

  1. Performance – Loading call analysis data in the CouchDB database is way faster than putting the same data in a MySQL database. Our preliminary results show a speed up factor of about 100 (this does not take the loading of audio recordings into account, though). Ok, we are comparing apples and oranges. CouchDB does not update the view indexes until they are requested, while MySQL updates its indexes as rows are inserted. And only a single document is inserted in CouchDB, compared to lots of rows in more than 15 tables in SQL. On the other hand, if insertions are done at application runtime (after the completion of the call), you better do it fast, especially if the IVR handles many hundred (if not thousand) ports.
  2. Evolution – Making modifications to a complex schema is painful, especially when you have applications deployed in the field. As documents do not have to follow a rigid schema, it is much easier to adapt our code to multiple versions.
  3. Attachments - Even if audio recordings can be stored in a traditional SQL database as blobs, a custom application is still required to access them. With CouchDB, recordings are stored as attachments to the JSON document for the corresponding call. Moreover, these recordings are easily accessible by other tools since CouchDB is itself a webserver and all documents and attachments have a URL.

Conclusion

Of course, there is no panacea and CouchDB is no exception. There are still some aspects of our system for which CouchDB does not provide a better solution than an SQL database. One of them is the support for custom queries. In the call viewer tool, it was possible to write custom SQL queries to find calls matching very specific criteria. Of course, CouchDB supports temporary views to do something equivalent. The main problem is the time taken to build the view. When hundreds of thousands or even millions of calls are processed, creating a temporary view can take a long time (several minutes).  Not so good for an interactive tool.

But overall, we have been very pleased by the performance of CouchDB and the flexibility it gives us.

More robust automated test scripts: wraparound mode

Lately, I have been involved in the development of a new reusable VoiceXML dialog module. The module is invoked via a <subdialog> call with a number of parameters, one of which having an impact on the order of the questions asked by the module.

Writing automated test scripts for such parameterized applications or modules is too often a very time-consuming task. One has to take the order of questions into account, leading to an explosion in the number of scenarios and lots of duplication. In such cases, you often end up testing a single configuration, assuming that all others will be only small variations that need not be tested. But is it really safe to do that?

One of the nice features of NuBot is the ability to write test scenarios that are robust to the order in which questions are asked. To do that, test scenarios need only be created in wraparound mode. Each scenario is composed of action groups, each of which consists in an association between a state in the application and an answer to give to the tested application.

In the wraparound mode, when NuBot receives a feedback from the application, it looks at its next group. If the feedback does not match the expected action group, instead of generating an error, it simply skips it and considers the next one, and so on. If it reaches the end of the scenario’s groups, it “wraps around” (thus the mode name) and considers the groups from the start of the scenario in turn. Only if it cannot match a step in the scenario will it generate an error.