NuGram IDE provides a tool to convert Nuance GSL grammars to SRGS ABNF, which can then be converted to XML form. But the tool does not convert the semantic tags. So lately we’ve started working on the conversion of GSL semantic tags to SISR, and what initially seemed like a simple project provoked a heated debate internally (well, I may be exaggerating a bit…
. I soon realized that this was because there are really two competing forces driving the design of such a tool:
- Correctness.The set of SISR tags generated automatically faithfully implement the behaviour of the corresponding Nuance GSL tags. In other words, the resulting grammar needs no manual intervention and the semantic results obtained using the generated grammar are always the same as if the original GSL grammar was used.
- Maintenance.The set of SISR tags generated automatically are easy to understand, and thus to modify. They are close to what an SISR developer would have written from the start.
To see why these two goals conflict, simply consider how calls to predefined GSL functions can be translated. The GSL tags language provides predefined functions for things as simple as arithmetic operations: $add, $sub, etc. Converting a GSL tag of the form:
{return (add($n $m))}
could generate a SISR tag like
{out = $add(n,m);}
if we want to preserve correctness. Here $add would be a function defined in a generated grammar header tag that implements an ECMAScript equivalent of the GSL add function, with proper handling of undefined values:
{!{
function $add(x, y) {
if (x == undefined || typeof x != "number") x = 0;
if (y == undefined || typeof y != "number") y = 0;
return x + y;
}
}!};
But if the converter inlines the call and adds some code to check for undefined values, it could produce something like:
{out = ((n == undefined || typeof n != "number") ? 0 : n)
+ ((m == undefined || typeof m != "number") ? 0 : m);}
when one would have simply written:
{out = n + m;}
Unfortunately, this last version may produce unexpected results at runtime. If n is undefined and m is 3, the sum will be NaN (not a number) instead of 3 as would have produced the original GSL tag. In that case, it is essential to complement the tool with a rigorous testing process, one that can ideally ensure that each and every semantic tag in the grammar will be executed at least once.
So which one is better?
The answer is: that depends. In some cases correctness is preferable, especially if the grammar requires little to no maintenance at all. That may be true of simple grammars. But most of the time, grammars change over time. New sentence patterns are added, rules are extracted, etc. So maybe it’s best to leave the choice to the developer by offering a flexible tool.
And you, what would you prefer: a conversion tool that ensures full correctness at all costs, or a tool that sometimes produces grammars that are potentially not equivalent to the original one but are more maintainable?
