Tag Archives: tips

Grammar tips & tricks #2 – return key/value pairs whenever possible

Tip #2: In SISR semantic tags, return key/value pairs whenever possible.

Strings all over the place

It is fairly common for new SRGS grammar writers to write SISR semantic tags that only return string values to calling rules or to the voice application, even when the data has some structure. For example, a dollar amount rule could return a string like this (in ABNF):

public $amount =
  $dollars {out = rules.dollars + ".00";}
  [and $cents {out = out.substring(0, out.length - 3)
                        + "." + rules.cents; }]
;

...

One obvious disadvantage of this approach is that the application has to extract the dollars and the cents from the returned string. Of course, a simple string to number conversion can be done. But due to possible rounding errors, it is best to extract both values separately and converting the two substrings to integers. This may not be that bad, machines are so fast these days.

A less obvious reason why this is not recommended relates to the fact that the computations made by the semantic tags can only begin once the engine has finished recognizing the utterance. In other words, the corresponding computation time directly adds to the application’s response time. The ECMAScript interpreter typically compiles the script (the semantic tag) to an intermediate representation before executing it. Unless the ASR properly caches the result of this compilation process, the script is compiled again and again. The more complicated the script is, the more processing power it takes to parse it, compile it, and execute it.

We also have to add to that the fact that string concatenation/substring extraction creates a lot of unnecessary temporary objects, thus putting a bigger burden on the garbage collector (or any other memory management algorithm employed by the ECMAScript interpreter).

Finally, since semantic tags are compiled and executed for every hypothesis in the N-best list, the computation time and the number of objects created grows proportionately with the number of hypotheses requested by the application. If we sum all this, we end up with a grammar that requires unnecessary processing power from the ASR engine, which can cause significant delays in the recognition process. This may even result in noticeable latency at the application level (i.e. some dead-air).

Use semantic keys instead

A better way to write the above grammar would be:

public $amount =
  $number {out.dollars = rules.number;
           out.cents   = 0; }
  [and $cents {out.cents = rules.cents; }]
;

...

Using explicit semantic keys has many advantages:

  • Documentation. This self-documents the type/purpose of the returned values.
  • Maintenance/evolution. The scripts are much simpler, thus easier to understand for someone trying to understand the grammar. It is also easier to add other keys later if need be.
  • Analytics. The presence of distinct semantic keys facilitates the analysis of field data. For example, we can be interested in performing a recognition performance test for only a subset of our collected utterances, i.e. those utterances whose value for the cents semantic key is 0.
Related posts:

Grammar tips & tricks #1 – rules naming

[This post is the first in a series of short posts giving tips and tricks on speech grammar writing.]

Tip #1: make sure that your rule names are always ECMAScript identifiers.

In SRGS grammars, rule names must be valid XML names and may not contain the following characters: ., :, and -. For people new to speech grammar writing, It is not always obvious why there is such a restriction.

When you start writing your first semantic tags, you understand why. When using semantics/1.0 tags, values returned by referenced rules are exposed as properties of the rules and meta objects, while with swi-semantics/1.0 (the Nuance OSR tag format), those values are exposed as variables. In other words, in both cases rule names must be valid ECMAScript identifiers. In ECMAScript civic-number is not an identifier, it’s an arithmetic operation!

Of course, NuGram IDE always enforces this restriction, any mistake will be reported as you type.

A related OSR-specific pitfall

With swi-semantics/1.0, you need to be even more cautious. It is always a bad idea to have a variable whose name can conflict with the name of a referenced rule. If the variable is already defined, the value of the referenced rule will become inaccessible.

$someRule =
    [$prefix { type = 'default' }]
    $<types.abnf#type> { type = type.value; }
    $<values.abnf#value> { value = value.value; }
;

This grammar won’t work if something from $prefix is uttered. This will cause the slot (variable) type to be set to "default" and prevent the value returned by the reference $<types.abnf#type> from being bound to the type variable. When the second semantic tag is executed, the value of the variable type will still be "default", which is not an object with a property value, thus causing an execution error.