May 25th, 2010 No Comments

by Dominique Boucher

The NuGram approach to dynamic grammars

I have just uploaded to Slideshare a short presentation about the Nu Echo approach to dynamic grammars.

For text-based applications too!

Remember that NuGram Server is not only for speech-enabled applications. You can use it to parse text-based sentences, too. So it is an ideal complement to your preferred cloud-based SMS or IM application platform like Tropo, Twilio, Teleku, just to name a few.

Try it now!

It’s free for development use, so don’t be shy. Give it a try! You simply need to register, upload your grammars, and use one of the many APIs we provide.

May 6th, 2010 No Comments

by Dominique Boucher

IVR unit testing in CVP Studio

In one of my previous posts, I presented the concept of IVR unit testing. Although a very nice concept in theory, I am sure many of you said to yourself: “Great, but I can’t do that since I use a graphical service creation environment (SCE)”. This may be true for some SCEs, but certainly not for all.

There are a few SCEs, like Cisco Unified Call Studio (formerly CVP Studio), that let you extend the environment with Java code. That’s what we did for one of our professional services projects. Let me briefly explain what we did in this project and present some of the benefits as well as a few lessons learned.

The application

The application was a very typical IVR DTMF-only hierarchical menu: lots of options, many optional messages at various places in the menu tree triggered by dynamic configuration options, information messages, etc. Each menu had to support a number of common navigation commands, like * to repeat, # to go back to the previous menu, and so on. The difficulty with such an application is that duplicating the dialog for each menu is quite time consuming and highly error-prone in the presence of constantly evolving customer requirements.

At least, CVP Studio provides a way to define reusable dialog patterns, but unfortunately once the pattern is copied at various places in your application, it cannot be modified in such a way that all its uses are automatically updated. You have to modify each use of the pattern manually.

In addition to those reusable dialog patterns, CVP Studio provides a programmatic API to implement custom elements. These elements can then be added to the SCE’s palette and used to implement nodes at various places in the application, each with its own configuration. Typically, such custom elements simply add some elements to a VoiceXML page (when they implement the VoiceElementBase interface).

For our application, we implemented all the menu nodes using a custom element. The element encapsulates the common behaviors shared by all menus, like how to handle no input, errors, the repeat key, etc. A key advantage of doing this is that when we need to change one of those behaviors, all the nodes in the application are updated at once (saving us a lot of maintenance headaches). Another advantage is that this custom element can be easily reused in other applications as well.

Of course, some will note that we could have used VoiceXML subdialogs rather than custom elements to implement our reusable dialogs. However, due to the design of our Java-based management console interface to configure all the dynamic elements of the application, it was more natural for us to build custom elements also in Java.

But the coolest thing about this approach is that the whole dialog for a single menu is driven by a small state machine that generates objects representing interactions with the caller (instead of plain XML elements) and accepting objects representing interaction results (like a no input, a no match, a recognition result, a DTMF input, etc.). And we ensured that it is possible to interact with the state machine without having to execute the dialog at run time. The state machines are completely decoupled from CVP Studio’s programmatic API!

Guess what? We could unit test all the menus very easily. We just wrote a test controller that injects interaction results programmatically into the state machine, retrieves the next interaction, and asserts some properties (which prompts are played, what options are available, etc.). It is thus very easy to test all the different situations that can be encountered at run time (call center open or closed, optional message activated or not, and so on).

Here is an example of a simple unit test:

The most interesting lines in this method are the ones near the end, beginning with testCase.addInputAssertion. They tell the test case which answers (interaction) from the user to simulate, as well as some assertions that must hold after the application has processed the interaction. For example, the first call simulates a NO MATCH event and the test case will make sure that the next step from the application will be to play a prompt (message) identified by the constant MenuConfiguration.NO_MATCH_1. The next one simulates a no input event and asserts that the generic options are enabled, and the menu prompts will be played. Finally, the third one simulates another no input event and ensures the call will be transferred.

This example only illustrates the testing of a simple generic behavior. The more interesting test cases involve specific nodes in the call-flow depending heavily on the dynamic configuration of the application. By stubbing the clock, for instance, we can make sure that messages telling the contact center is closed are properly played outside of business hours and that the option to transfer the call to an agent is disabled. Other tests ensure that during business hours, the proper transfer reasons are set before transferring to the call center queue manager.

Some lessons learned

Note that there are also some drawbacks to this approach as well. First, this technique does not make it possible to test the sequencing of these customm nodes in the application. For that, we had to rely on manual testing. But that was not such a big deal after all. The way those custom nodes are connected in CVP makes the validation process quite trivial by comparing the call-flow design document with the call-flow in CVP Studio. For instance, if the application goes from a A to B when DTMF 8 is pressed in the former, there is a transition labelled “8″ from custom node A to custom node B in the latter.

Reporting was another issue we faced due to this approach. CVP already provides extensive reporting capabilities when the application uses only predefined elements. When using custom elements, you have to carefully log events in a special table of the Informix DB, and this greatly complicates the consolidation of information to get a precise understanding of what’s happening with the calls.

Was it worth it?

All in all, the advantages of this approach far outweighed the issues just mentioned. At least for this project, in which a large part of the configuration is dynamic and requires a fair amount of Java code anyway. We still maintain the application and it evolves quite rapidly, even several years after its initial deployment.

Hey, I don’t use CVP!

You don’t use CVP? There are other approaches giving you some of these benefits. I’ll outline some of them in upcoming posts. Stay tuned!

And of course, if you have experience implementing unit testing using graphical service creation environments, please share it with us!

May 6th, 2010 No Comments

by Dominique Boucher

Grammar tips & tricks #1 - rules naming

[This post is the first in a series of short posts giving tips and tricks on speech grammar writing.]

Tip #1: make sure that your rule names are always ECMAScript identifiers.

In SRGS grammars, rule names must be valid XML names and may not contain the following characters: ., :, and -. For people new to speech grammar writing, It is not always obvious why there is such a restriction.

When you start writing your first semantic tags, you understand why. When using semantics/1.0 tags, values returned by referenced rules are exposed as properties of the rules and meta objects, while with swi-semantics/1.0 (the Nuance OSR tag format), those values are exposed as variables. In other words, in both cases rule names must be valid ECMAScript identifiers. In ECMAScript civic-number is not an identifier, it’s an arithmetic operation!

Of course, NuGram IDE always enforces this restriction, any mistake will be reported as you type.

A related OSR-specific pitfall

With swi-semantics/1.0, you need to be even more cautious. It is always a bad idea to have a variable whose name can conflict with the name of a referenced rule. If the variable is already defined, the value of the referenced rule will become inaccessible.

$someRule =
    [$prefix { type = 'default' }]
    $<types.abnf#type> { type = type.value; }
    $<values.abnf#value> { value = value.value; }
;

This grammar won’t work if something from $prefix is uttered. This will cause the slot (variable) type to be set to "default" and prevent the value returned by the reference $<types.abnf#type> from being bound to the type variable. When the second semantic tag is executed, the value of the variable type will still be "default", which is not an object with a property value, thus causing an execution error.

March 30th, 2010 No Comments

by Dominique Boucher

NuGram 2.1 available now!

The Nu Echo team is proud to announce the availability of NuGram 2.1.

The noteworthy new features in this release are:

  • Enhanced sentence generation tool (NuGram IDE)
    The sentence generation algorithm has been further improved and a new strategy (Rule examples) has been added. Sentences can now be generated from specific sentence patterns. Also, the generation process can be stopped.
  • New sentence explorer (NuGram IDE)
    The user interface of the sentence explorer has been completely changed. It is now much more intuitive and easy to use. It also allows sentence patterns to be added to the sentence generation tool.
  • Semantic interpretation optimizations
    All supported semantic tag languages based on ECMAScript have been optimized (compiled scripts are now properly cached). This dramatically increases the performance of the coverage test tool.
  • Complete rewrite of the underlying parsing algorithm
    The algorithm that matches sentences with the grammar rules has been completely rewritten. It is now much more efficient, in terms of speed and memory consumption.
  • Post-processing API (NuGram Server SDK / Professional Edition only)
    NuGram Server now provides an API to implement and deploy application-specific post-processing routines.
  • Initial support for different target ASR engines (NuGram IDE)
    It is now possible to specify the target ASR engine in the preferences. This affects the way words in grammars are normalized, and also how grammars are converted to GrXML.
  • Small enhancements to the ABNF dynamic grammar templating language
    The templating language now supports new forms, like optional grammar headers.
  • Many small improvements and bug fixes

The free Basic Edition can be downloaded directly from within your Eclipse environment. Simply follow the download instructions. Or contact us for the Professional Edition.

Please, let us know what you think of these new features!

January 25th, 2010 7 Comments

by Dominique Boucher

Voice APIs: back to basics

We definitely live in interesting times. After years of pushing hard on VoiceXML (2.0 and 2.1), the industry comes up regularly with new approaches departing significantly from the newly proposed VoiceXML 3.0. And these approaches sometimes come from companies working hard on the VoiceXML standardization effort.

For instance, last week Voxeo announced a new interface to its Tropo platform, called Tropo WebAPI. To build a communications application, one has simply to write a web service/application producing JSON documents. These documents contain simple instructions for the communications platform like: play this prompt, ask a question, transfer the call, etc. Very simple instructions, indeed. The results are then sent server-side to the application for further processing and deciding what to do next.

This approach reminds me of TwiML, Twilio’s own markup language for implementing voice applications, and (to a certain extent) FastAGI, the Asterisk way of developing server-side voice applications (the preferred way of deploying applications on the Cloudvox platform).

What do these approaches have in common? Well, they all offer a much simpler programming model than VoiceXML. In VoiceXML, there is the form-filling algorithm which tries to fill slots in a form automatically. VoiceXML applications can also contain a fair amount of scripting (in ECMAScript) with many scoping rules for variables. It also provides some exception mechanisms (with catch and throw elements), a root document for storing data, etc. No wonder most development environments targeting VoiceXML platforms only make use of a limited subset of VoiceXML.

In fact, the new approaches are not programming models, they essentially provide low-level instructions for the various voice platforms. Much like a virtual machine. It’s up to the user of the platform to implement its own programming model on top of these instruction sets. And this is a very attractive offer, as this will most certainly ignite the development of new application programming environments and frameworks, some of which will be platform agnostic.

We lived a somewhat similar period at the end of the last century. There were many non-interoperable proprietary IVR platforms, and the industry came up with a solution: VoiceXML. Will we see something similar happen with these new approaches? I doubt it. I think that all these approaches are sufficiently similar that a good abstraction layer on the application side can suffice to support them all easily. In the 90’s, porting an application to a new platform was plainly impossible without a complete rewrite.

Strangely, the programming languages community lived something similar a few years ago. From around 1997 to the start of the century, the craze for  Java almost killed research in the field of object-oriented programming language design not targeting Java or the JVM. Then, in 2003 or so, some leading researchers decided consciously that it was time to start a post-Java era. And it’s at about that time that many programming languages started flourishing and that we saw a greater acceptance for dynamic/scripting languages (on the JVM or not). This period also coincided with the rise of the Web 2.0 and a new culture of entrepreneurship, thanks to Paul Graham Y Combinator.

I think we are living something similar today in the communications industry, though a few years later. We see young entrepreneurs and new startups with innovative ideas enter the market. By the way, a few of them presented their ideas at StartupCamp Telephony last week, an event sponsored by Twilio and PhoneTag as part of the ITExpo conference.

The years to come promise to be very exciting.