Tag Archives: semantic tags

Grammar tips & tricks #2 – return key/value pairs whenever possible

Tip #2: In SISR semantic tags, return key/value pairs whenever possible.

Strings all over the place

It is fairly common for new SRGS grammar writers to write SISR semantic tags that only return string values to calling rules or to the voice application, even when the data has some structure. For example, a dollar amount rule could return a string like this (in ABNF):

public $amount =
  $dollars {out = rules.dollars + ".00";}
  [and $cents {out = out.substring(0, out.length - 3)
                        + "." + rules.cents; }]
;

...

One obvious disadvantage of this approach is that the application has to extract the dollars and the cents from the returned string. Of course, a simple string to number conversion can be done. But due to possible rounding errors, it is best to extract both values separately and converting the two substrings to integers. This may not be that bad, machines are so fast these days.

A less obvious reason why this is not recommended relates to the fact that the computations made by the semantic tags can only begin once the engine has finished recognizing the utterance. In other words, the corresponding computation time directly adds to the application’s response time. The ECMAScript interpreter typically compiles the script (the semantic tag) to an intermediate representation before executing it. Unless the ASR properly caches the result of this compilation process, the script is compiled again and again. The more complicated the script is, the more processing power it takes to parse it, compile it, and execute it.

We also have to add to that the fact that string concatenation/substring extraction creates a lot of unnecessary temporary objects, thus putting a bigger burden on the garbage collector (or any other memory management algorithm employed by the ECMAScript interpreter).

Finally, since semantic tags are compiled and executed for every hypothesis in the N-best list, the computation time and the number of objects created grows proportionately with the number of hypotheses requested by the application. If we sum all this, we end up with a grammar that requires unnecessary processing power from the ASR engine, which can cause significant delays in the recognition process. This may even result in noticeable latency at the application level (i.e. some dead-air).

Use semantic keys instead

A better way to write the above grammar would be:

public $amount =
  $number {out.dollars = rules.number;
           out.cents   = 0; }
  [and $cents {out.cents = rules.cents; }]
;

...

Using explicit semantic keys has many advantages:

  • Documentation. This self-documents the type/purpose of the returned values.
  • Maintenance/evolution. The scripts are much simpler, thus easier to understand for someone trying to understand the grammar. It is also easier to add other keys later if need be.
  • Analytics. The presence of distinct semantic keys facilitates the analysis of field data. For example, we can be interested in performing a recognition performance test for only a subset of our collected utterances, i.e. those utterances whose value for the cents semantic key is 0.
Related posts:

Converting GSL tags to SISR – conflicting goals

NuGram IDE provides a tool to convert Nuance GSL grammars to SRGS ABNF, which can then be converted to XML form. But the tool does not convert the semantic tags. So lately we’ve started working on the conversion of GSL semantic tags to SISR, and what initially seemed like a simple project provoked a heated debate internally (well, I may be exaggerating a bit… ;-) . I soon realized that this was because there are really two competing forces driving the design of such a tool:

  • Correctness.The set of SISR tags generated automatically faithfully implement the behaviour of the corresponding Nuance GSL tags. In other words, the resulting grammar needs no manual intervention and the semantic results obtained using the generated grammar are always the same as if the original GSL grammar was used.
  • Maintenance.The set of SISR tags generated automatically are easy to understand, and thus to modify. They are close to what an SISR developer would have written from the start.

To see why these two goals conflict, simply consider how calls to predefined GSL functions can be translated. The GSL tags language provides predefined functions for things as simple as arithmetic operations: $add, $sub, etc. Converting a GSL tag of the form:

{return (add($n $m))}

could generate a SISR tag like

{out = $add(n,m);}

if we want to preserve correctness. Here $add would be a function defined in a generated grammar header tag that implements an ECMAScript equivalent of the GSL add function, with proper handling of undefined values:

{!{
  function $add(x, y) {
    if (x == undefined || typeof x != "number") x = 0;
    if (y == undefined || typeof y != "number") y = 0;
    return x + y;
  }
}!};

But if the converter inlines the call and adds some code to check for undefined values, it could produce something like:

{out = ((n == undefined || typeof n != "number") ? 0 : n)
        + ((m == undefined || typeof m != "number") ? 0 : m);}

when one would have simply written:

{out = n + m;}

Unfortunately, this last version may produce unexpected results at runtime. If n is undefined and m is 3, the sum will be NaN (not a number) instead of 3 as would have produced the original GSL tag. In that case, it is essential to complement the tool with a rigorous testing process, one that can ideally ensure that each and every semantic tag in the grammar will be executed at least once.

So which one is better?

The answer is: that depends. In some cases correctness is preferable, especially if the grammar requires little to no maintenance at all. That may be true of simple grammars. But most of the time, grammars change over time. New sentence patterns are added, rules are extracted, etc. So maybe it’s best to leave the choice to the developer by offering a flexible tool.

And you, what would you prefer: a conversion tool that ensures full correctness at all costs, or a tool that sometimes produces grammars that are potentially not equivalent to the original one but are more maintainable?

Grammar tips & tricks #1 – rules naming

[This post is the first in a series of short posts giving tips and tricks on speech grammar writing.]

Tip #1: make sure that your rule names are always ECMAScript identifiers.

In SRGS grammars, rule names must be valid XML names and may not contain the following characters: ., :, and -. For people new to speech grammar writing, It is not always obvious why there is such a restriction.

When you start writing your first semantic tags, you understand why. When using semantics/1.0 tags, values returned by referenced rules are exposed as properties of the rules and meta objects, while with swi-semantics/1.0 (the Nuance OSR tag format), those values are exposed as variables. In other words, in both cases rule names must be valid ECMAScript identifiers. In ECMAScript civic-number is not an identifier, it’s an arithmetic operation!

Of course, NuGram IDE always enforces this restriction, any mistake will be reported as you type.

A related OSR-specific pitfall

With swi-semantics/1.0, you need to be even more cautious. It is always a bad idea to have a variable whose name can conflict with the name of a referenced rule. If the variable is already defined, the value of the referenced rule will become inaccessible.

$someRule =
    [$prefix { type = 'default' }]
    $<types.abnf#type> { type = type.value; }
    $<values.abnf#value> { value = value.value; }
;

This grammar won’t work if something from $prefix is uttered. This will cause the slot (variable) type to be set to "default" and prevent the value returned by the reference $<types.abnf#type> from being bound to the type variable. When the second semantic tag is executed, the value of the variable type will still be "default", which is not an object with a property value, thus causing an execution error.