Tag Archives: apis

Ruby NuGram Server API now available as a gem

Jason Goecke from Voxeo LabsJason Goecke, VP Innovation at Voxeo Labs and one of two founders of Adhearsion, is at the RHoK #2 (Random Hacks of Kindness) in San Francisco this weekend. In an effort to further simplify the use of our NuGram Hosted Server API, he turned the Ruby API into a full-fledged Ruby gem. The code is available from github.

To install the Ruby API, just enter the following at the command prompt:

> gem install nugramserver-ruby

You can now start using the NuGram Hosted Server API by adding the following two lines of code to your Ruby application:

require "rubygems"
require "nugramserver-ruby"

And here is a complete example :

require "rubygems"
require "nugramserver-ruby"

# Definition of the grammar template
template = "#ABNF 1.0 ISO-8859-1;

language en-US;
mode voice;
tag-format <semantics/1.0>;

root $voicedialer;

public $voicedialer =
    [$politeness] $contacts [please]
;

private $contacts =
  @alt
    @for (contact : contacts)
      ([@word contact.firstname] @word contact.lastname
       @tag \"out.number  = '\" contact.number \"';\" @end)
    @end
  @end
;

private $politeness =
      ((I (would like | want) | I'd like) to (talk to | speak with))
    | (give me | gimme | get me)
;"

# Create a connection to the server
server = GrammarServer.new()

# Initiate a new session
session = server.create_session("username", "password")

# Upload the grammar template (this only needs to be done the first time)
session.upload("voicedialer.abnf", template)

# Push some data to instantiate the template.
grammar = session.instantiate("voicedialer.abnf",
                              {'contacts' =>
                                [{ 'firstname' => "John", 'lastname' => 'Doe',
                                   'number' => '1234' },
                                 { 'firstname' => "Bill", 'lastname' => 'Smith',
                                   'number' => '4321' }]})

# Retrieve the URL of the resulting grammar in GrXML form
puts "grammar url = ", grammar.get_url('grxml')

# Retrieve the content of the resulting grammar in ABNF form
puts grammar.get_content("abnf")

# Terminate the session
session.disconnect

Happy hacking! (And many thanks to Jason for contributing this gem!)

Related posts:

Unit testing in the IVR world

Last summer, for a demo I gave at SpeechTEK, I wrote a prototype dialog-based application framework in Erlang. The framework features a synchronous API to write dialog applications that could be accessed via instant messaging (using either IMified or XMPP) or the phone (through a VoiceXML gateway). What do I mean by synchronous API? Well, an API giving you the illusion that your program simply has to ask a question using a procedure call, and the result of the call is a representation of the answer from the user of the application.

Too abstract a definition? Look at the following Java code (this is a rough and simplified translation of some Erlang code):

void askPin() {
    Answer answer = dialogController.ask("What is your pin?");
    if (answer instanceOf DTMFAnswer)  {
       dialogController.play("Thanks, your pin is "
                                 + ((DTMAnswer) answer).getDigits();
       dialogController.hangup();
       finish();
    }
    else if (answer instanceOf NoInputAnswer) {
       retryPin();
    }
    else if (answer instanceOf Hangup) {
       finish();
    }
}

The askPin method calls the dialogController.play method to play a TTS string to the caller, waits for an answer, and processes it by either calling the dialogController.play function or the finish function on hangup.

This is essentially what platforms like Tropo, voicephp, and a few others provide to help develop telephony applications. This approach is very interesting for a number of reasons. For instance, it lets us use the abstraction mechanisms we are most familiar with: functions, classes, etc. And we can still use our favorite authoring tool. But more importantly, we don’t have to learn a new programming model, like VoiceXML (although the framework could itself produce VoiceXML in order to be executed on a standard VoiceXML platform, which is the approach taken in my prototype).

Dialog unit testing

An interesting feature of the prototype is its immediate support for dialog unit testing, due to its model-view-controller (MVC) architecture. Unit testing is an great technique for building robust software. (Unfortunately, the idea is not that widespread in the IVR world.)

To illustrate, here is an excerpt from a unit test for the code above:

    dialogController.send(Answer.Next);
    nextInteraction = dialogController.getInteraction(dialog);
    assertPrompts(nextInteraction, new String[]{
        "What is your pin?"
    });
    assertGrammars(nextInteraction, new String[]{ "pin.abnf" });

    dialogController.send(Answer.NoInput);
    nextInteraction = dialogController.getInteraction(dialog);
    assertPrompts(nextInteraction, new String[]{
        "Please answer the question.",
        "What is your pin?"
    });
    assertGrammars(nextInteraction, new String[]{ "pin.abnf" });

    dialogController.send(new DTMFAnswer("123456");
    nextInteraction = dialogController.getInteraction(dialog);
    assertPrompts(nextInteraction, new String[]{
          "Thanks, your pin is 1 2 3 4 5 6"
    });
    assertGrammars(nextInteraction, new String[]{});

In this example, the dialogController.send method call simulates an answer from the caller, while the call to dialogController.getInteraction retrieves the next actions taken by the application. The result of the latter is then checked against the expected action.

At Nu Echo, we are compulsive about tests. So we have developed a practice around dialog unit testing that we try to apply whenever we can. Let me share some of thoughts on the subject.

The “what”

There are a number of questions that arise when we start writing unit tests. The first is obviously: what do we want to test?

In the case of a dialog unit test, we’ll want to test the observable behaviour of the application, regardless of the way the code is organized. For example, we won’t want to test that the code is organized into classes and methods, that the application goes through a state X, etc. Doing so would make the tests more fragile to code reorganization (and we, as developers, do this all the time, right?). In fact, such dialog unit tests make us more confident in the application after refactoring parts of the application.

So we usually test:

  • which prompts are played,
  • which grammars are active,
  • the interaction properties (timeouts, maximum number of n-bests, etc.),
  • the attached data (when possible).

Stubs

Another interesting question is: what do we do with back-end calls (databases, web-services, etc.)? In principle, unit tests should be replicable, and ideally independant of the runtime environment.

Here the answer is simple: we stub everything. We completely simulate the back-end. However, in some cases this can be relatively difficult to do when there are complex relationships between the various pieces of information manipulated by the application.

The value of unit testing

Given a good framework, dialog unit tests should be very easy to write to encourage their development. It can take a few more minutes to code a unit test than to call the application directly. But this cost is soon amortized as we run the test. Each run of the test will take a fraction of a second, much faster than taking the phone. This means we can run hundreds of tests in a matter of seconds.

Moreover, some tests are very hard to replicate, especially when we introduce speech recognition in the equation. If the application has several thresholds for a given question, how do we test each case systematically to make sure the application behaves as intended in the specification? Unit tests are invaluable in this case.

Again, the use of a programming language is very helpful in creating lots of unit tests in very few lines of code. Repeated parts of some tests can be abstracted away in methods/functions used many times. For example, testing that a sequence of DTMF inputs leads to the call being transfered to a given extension with some data attached to it can be encoded in a function. This function can then be called for each path in the menu tree, like this:

  test_path(["1","3","2"],           # DTMF sequence
            "4231",                  # Extension
            {"reason" => "support",  # Data
             "product" => "nugram",
             "language" => "french"})

But the real value of a good unit test suite is that once you have it, you are not afraid anymore of inadvertently introducing a bug when you implement a new request for modification. Of course, it is not a panacea. At one point, you’ll have to take the phone to test some functional aspects of the application (usability, recorded prompts content, etc). But hopefully, you’ll not stumble upon trivial bugs that should have been caught much earlier in the development process by your unit tests.

So let me ask you: how do you test your application? What techniques do you employ?

NuGram Hosted Server client APIs now available

In order to ease the integration of dynamic grammars hosted on NuGram Hosted Server, the NuGram team has developed client APIs in a variety of programming languages. The code is available on Github, but a zip file can be downloaded directly from the NuGram web site.

Supported languages/systems are currently:

And there are more to come.

Using these APIs, it becomes really easy to create dynamically-generated grammars for use by your favorite hosted communication platform (be it a VoiceXML platform or one of the many new API-based platforms), or compute the semantic interpretation of a textual sentencehe following grammar template.

An example

Suppose we’d like to implement a simple voice dialing application. A dynamic grammar template (called voicedialer.abnf) for this application would look like the following:

To get a valid recognition grammar from this template, we need to provide data to the template engine (in this case the contact list) to fill in the blanks using call-specific information. This is called instantiating the dynamic grammar.

So to instantiate the template above and retrieve the URL for the SRGS XML representation of the generated grammar in Ruby, only four lines of code are needed:

As can be seen in this example, data used by the templating engine to create the resulting grammar is specified using native data structures of the host language (Ruby dictionaries/hashmaps in this case).

Adding dynamic grammars to voice applications has never been easier! All you need is register for a free account on NuGram Hosted Server and download the APIs.