Category Archives: Development

Grammar conversion : lessons learned

Lately, I have been involved in a number of grammar conversion projects. This has been a great opportunity to put our process and  tools to the test once again. And since every project has its peculiarities, we learn constantly.

The process we outlined about a year ago omitted  a number of small details. That was OK for small scale conversion projects. But when you have to deal with much larger projects (with thousands of grammars to convert), these details add up significantly. Let me share some of the issues we face daily.

It’s not just semantic tags

When you have tools to automatically convert semantics tags from one format to another, grammar conversion can seem to be a no-brainer. But reality is not that simple. Grammars are not written for an abstract specification, they are written for a very specific recognition engine. They often contain:

  • Words (tokens) that map to very specific pronunciations or that try to model some disfluencies (like hesitations, for instance), but for which the SRGS $GARBAGE rule is more appropriate.
  • Multiword duplicates, with one sequence of space-separated words, and a similar sequence of underscore-separated words to allow cross-word phonetization (like “thirty one” and “thirty_one”).
  • Words that map to very specific, tuned pronunciations. Such words often have an unusual orthography to make sure they are not confused with real words.

All this means that there are a number of transformations either to the original grammar or to the converted grammars that must be applied. This can be by means of regular expression search&replace, or manually inspecting grammars.

Generation of coverage sets

When dealing with hundreds (if not thousands) of grammars, it is not feasible to create initial coverage test sets manually. This is way too time consuming. That means you have to find a way to generate those initial coverage test sets automatically in batch. But how do you do that?

Fortunately, NuGram IDE already provides sophisticated tools to analyze grammars and generate sentences from them. We just built on this foundation a tool to automatically generate coverage tests sets for a set of ABNF grammars. The tool also reports problems found in the grammars, like the use of digits in voice grammars, or words in DTMF grammars.

The coverage set generation tool uses a combination of  configuration and sophisticated analyses to determine how to generate sentences and how many sentences to generate. For example, it’s not possible to generate all sentences from a grammar that covers an infinite number of sentences. When that’s the case (or when the number of sentences covered by the grammar is above a certain threshold), the tool reverts to other generation strategies.

Recognition tests as part of the QA process

Finally, even a syntactically valid grammar may fail to load in the ASR for a variety of reasons, the most common one being a limitation or constraint from the ASR  itself. For this reason, we got to the conclusion that doing recognition tests (ideally benchmarking of the converted grammars) is a very useful addition to the QA process. Of course, simply compiling grammars may catch a number of problems. But doing a “before and after” comparison can detect conversion problems that were not caught by the coverage tests when they are not exhaustive.

Another benefit of doing recognition tests is the ability to check the performance of the converted grammars to identify those needing additional work. Some converted grammars may have words that prove difficult to recognize with the new engine because they are not properly phonetized, thus calling for application-specific (or even grammar-specific) phonetic dictionaries.

What about DTMF?

In the specific case of converting GSL grammars to GrXML or ABNF,  a complication arises with the presence, in the same grammar, of both DTMF sequences and words. I will discuss this issue in a separate post.

Session timeouts in NuGram Hosted Server

(This post has nothing to do with speech technologies or IVR applications. It’s merely a discussion on an implementation detail I described at the Erlang Montreal meetup and it’s rather technical.)

In my previous post about my talk at the Erlang Montreal meetup, slide 15 briefly outlines how session timeouts are implemented in NuGram Hosted Server.  The code is duplicated here:

receive
…
after Timeout ->
    db:expire_session(self())
end

This code uses the Erlang receive..after construct to handle timeouts. The construct tries to extract a message from the process mailbox, and waits at most Timeout milliseconds if there are no matching messages (variables start with an uppercase letter in Erlang).

This is great when sessions are represented using plain Erlang processes (I described this technique here). But there is a much better way to achieve the same effect when implementing servers using OTP’s gen_server behaviour. (One of our hard learned lessons is to take time to properly learn OTP, Erlang’s Open Telecommunications Platform, before building a production-grade system. It’s definitely worth the investment. It’s what puts Erlang in a totally different category than most programming languages and systems.)

When implementing a server using gen_server, one has to implement a few callback functions (namely handle_call for synchronous calls, handle_cast for asynchronous ones, and handle_info for other messages). In order to specify request timeouts, values returned by those three functions must provide the optional timeout:

handle_call(Request, From, State) ->
    Reply = ...
    {ok, Reply, NewState, Timeout};
...

If the server does not receive any message during the next Timeout milliseconds, the timeout message is sent to the process and must be handled by the handle_info function. To stop the process, something like the following can be done:

handle_info(timeout, State) ->
  %% Do some clean up
  {stop, normal, State};
...

This simply tells the server is to be shut down normally and that its last state is State (a great thing to know when things go wrong).

Using NuBot with the Tropo Scripting API

Ever wondered how to instrument an existing application for use with the NuBot IVR Testing Platform? My colleague Pascal wrote a helper function in Groovy for easy instrumentation of applications built using the Tropo Scripting API.

The trick is to define a closure encapsulating the playing of DTMF sequences (these sequences are required in order to synchronize the IVR application with the NuBot test scenario):

def sequencer = { sequence, closure ->
    if (dtmfSequencerEnabled) {
        for (dtmf in sequence) {
            switch (dtmf) {
                case "*":say("${baseAudioUrl}/dtmf/star.wav");;
                case "#":say("${baseAudioUrl}/dtmf/pound.wav");
                default:say("${baseAudioUrl}/dtmf/${dtmf.toLowerCase()}.wav");
            }
        }
    }

    if (closure) return closure()
}

Using this definition, one can instrument an application very easily:

sequencer("a") {
    say("Hello. Thank you for calling the Travel Agency Customer Satisfaction Department")
};

The code, as well as a complete NuBot project and a few instrumented Tropo examples, is on github.

CouchDB for call analysis data – a case study

At Nu Echo, we’ve been developing and refining our own VoiceXML application framework for years now. As part of our nth rewrite (and I’ll talk more about that rewrite and why we did it in another post), we decided to experiment with CouchDB. (For those new to CouchDB, it’s a schema-less document-oriented database. A so-called NoSQL database.)

The first area where we saw a fit for CouchDB was the storage of call analysis data. This data consists of various attributes associated with a call, information about each interaction (like recognition results) and each transaction (groups of interactions). It can also be augmented with the recordings saved by the ASR engine. Call analysis data is used by our call viewer tool to listen to calls, search for calls exhibiting some specific caller behaviors, produce reports, etc.

In the previous incarnation of our framework, call analysis data was stored on disk in a plain text file, and optionally in a SQL database. Due to the richness of our model, the SQL schema consisted of about 15 tables. And the representation of the same data in the text file was quite complex (tab-separated values, with some fields encoded in JSON format). At the end of each call, data collected during the call was stored on disk and optionally stored in the SQL database. We also had a script that could read all the files on disk and push the data in the SQL database at a later time.

Adding support for CouchDB

The very first step toward our support of CouchDB consisted in rewriting the serialization code to produce JSON-encoded call analysis data instead of our complicated text format.  Now, data for  a call is written as a single JSON object, one per line, prefixed by the call Id. This greatly simplified the code to read data back into memory.

The next step was to write a script to push the data to CouchDB. The script simply reads the call data, one call per line, and PUTs them to CouchDB in batches of 100 calls using the bulk API in order to increase performance.

Finally, we had to rewrite the part of our call viewer tool connecting to a database to retrieve calls data matching some patterns. It relies on some simple CouchDB views, but not that much in order to be as independent as possible of the database layer (it is possible to retrieve calls from text files as well from the call viewer).

Benefits

We obtained several benefits by moving to CouchDB:

  1. Performance – Loading call analysis data in the CouchDB database is way faster than putting the same data in a MySQL database. Our preliminary results show a speed up factor of about 100 (this does not take the loading of audio recordings into account, though). Ok, we are comparing apples and oranges. CouchDB does not update the view indexes until they are requested, while MySQL updates its indexes as rows are inserted. And only a single document is inserted in CouchDB, compared to lots of rows in more than 15 tables in SQL. On the other hand, if insertions are done at application runtime (after the completion of the call), you better do it fast, especially if the IVR handles many hundred (if not thousand) ports.
  2. Evolution – Making modifications to a complex schema is painful, especially when you have applications deployed in the field. As documents do not have to follow a rigid schema, it is much easier to adapt our code to multiple versions.
  3. Attachments - Even if audio recordings can be stored in a traditional SQL database as blobs, a custom application is still required to access them. With CouchDB, recordings are stored as attachments to the JSON document for the corresponding call. Moreover, these recordings are easily accessible by other tools since CouchDB is itself a webserver and all documents and attachments have a URL.

Conclusion

Of course, there is no panacea and CouchDB is no exception. There are still some aspects of our system for which CouchDB does not provide a better solution than an SQL database. One of them is the support for custom queries. In the call viewer tool, it was possible to write custom SQL queries to find calls matching very specific criteria. Of course, CouchDB supports temporary views to do something equivalent. The main problem is the time taken to build the view. When hundreds of thousands or even millions of calls are processed, creating a temporary view can take a long time (several minutes).  Not so good for an interactive tool.

But overall, we have been very pleased by the performance of CouchDB and the flexibility it gives us.

More robust automated test scripts: wraparound mode

Lately, I have been involved in the development of a new reusable VoiceXML dialog module. The module is invoked via a <subdialog> call with a number of parameters, one of which having an impact on the order of the questions asked by the module.

Writing automated test scripts for such parameterized applications or modules is too often a very time-consuming task. One has to take the order of questions into account, leading to an explosion in the number of scenarios and lots of duplication. In such cases, you often end up testing a single configuration, assuming that all others will be only small variations that need not be tested. But is it really safe to do that?

One of the nice features of NuBot is the ability to write test scenarios that are robust to the order in which questions are asked. To do that, test scenarios need only be created in wraparound mode. Each scenario is composed of action groups, each of which consists in an association between a state in the application and an answer to give to the tested application.

In the wraparound mode, when NuBot receives a feedback from the application, it looks at its next group. If the feedback does not match the expected action group, instead of generating an error, it simply skips it and considers the next one, and so on. If it reaches the end of the scenario’s groups, it “wraps around” (thus the mode name) and considers the groups from the start of the scenario in turn. Only if it cannot match a step in the scenario will it generate an error.