Tip #3: In SRGS grammars, use global semantic interpretation (SI) tags to simplify SI tags in rule expansions.
It is fairly common in SRGS grammars to put some form of computation in semantics tags. For example, checksum algorithms (like the Luhn algorithm) are commonly used in credit card number grammars.
When grammars contain those kind of computations, it is good practice to use global SI tags to put functions and constants definitions. These SI tags are declared before the definition of the first rule, as part of the other grammar headers. They must be followed by a semicolon. In ABNF form, this looks like:
#ABNF 1.0;
mode voice;
root $rootRule;
{
// header tag
};
$rootRule =
...
;
...
The use of global tags has several advantages:
- Functions are more easily testable. The functions declared in the global tags can be developed and tested outside of the grammar file (using a JavaScript interpreter like SpiderMonkey, Rhino, or V8), and later copied into the global tag.
- It avoids code duplication. The use of functions usually reduces code duplication, which lowers the risk of fixing a problem at one place only and missing one.
- Semantic interpretation is less CPU-intensive. Another side-effect of using functions is that SI tags usually get smaller, thus reducing the time taken to parse them and interpreting them, leading to faster interpretation and better response time. (Semantic interpretation tags are usually executed after the last word has been uttered so it’s sometimes important to optimize them.)
What about GrXML?
In the XML form, you simply put tag elements before the first rule element. But you don’t really need to know that, right? NuGram IDE can convert ABNF grammars to their XML counterpart so easily!
A concrete example
Let’s illustrate this by considering a simple grammar for a 12-digit account number using the Luhn algorithm to validate the number. Here is a first version of the grammar:
#ABNF 1.0 UTF-8;
language en-US;
mode voice;
tag-format <semantics/1.0>;
root $accountNumber;
public $accountNumber =
{ out.number = ""; var checksum = 0; }
( $digit {!{
out.number += rules.digit;
var digit = parseInt(rules.digit);
var doubledigit = digit * 2;
if (doubledigit > 9)
checksum += (doubledigit % 10) + 1;
else
checksum += doubledigit;
}!}
$digit {
out.number += rules.digit;
checksum += parseInt(rules.digit);
}) <6>
{ out.valid = (checksum % 10) == 0;}
;
private $digit =
one {out = "1"} | two {out = "2"}
| three {out = "3"} | four {out = "4"}
| five {out = "5"} | six {out = "6"}
| seven {out = "7"} | eight {out = "8"}
| nine {out = "9"} | (zero | oh) {out = "0"}
;
The code to calculate the checksum is mixed with the rule references to collect the digits. This makes the grammar look much more complex than it really is. And its performance is much worse than it could be.
If we move the checksum computation in a header tag, we obtain the following grammar:
#ABNF 1.0 UTF-8;
language en-US;
mode voice;
tag-format <semantics/1.0>;
root $accountNumber;
{!{
function luhnCheck(digits) {
var checksum = 0;
for (var i = 0; i<12; i++) {
var digit = parseInt(digits.charAt(i));
if (i % 2 == 0) {
var doubledigit = digit * 2;
if (doubledigit > 9)
checksum += (doubledigit % 10) + 1;
else
checksum += doubledigit;
}
else
checksum += digit;
}
return (checksum % 10) == 0;
}
}!};
public $accountNumber =
{ out.number = "";}
( $digit { out.number += rules.digit; }) <12>
{ out.valid = luhnCheck(out.number); }
;
private $digit =
one {out = "1"} | two {out = "2"}
| three {out = "3"} | four {out = "4"}
| five {out = "5"} | six {out = "6"}
| seven {out = "7"} | eight {out = "8"}
| nine {out = "9"} | (zero | oh) {out = "0"}
;
Now the accountNumber rule is much simpler and it is clear that it only accepts 12 digits. Moreover, the validation function can be tested independently. If the code is copied to a file named checksum.js, I can launch the SpiderMonkey interpreter and test the function like this:
[tmp] js
js> load("checksum.js")
js> luhnCheck("123456789012")
false
js> luhnCheck("123456789015")
true
js> ^D
[tmp]
In fact, these test cases can be put in the source file along with the code. But you get the idea.
Global scope is read-only
Beware, when writing your SI tags, that the global scope is read-only for SI tags, while it is mutable for all global tags. That means a variable cannot be declared in a global SI tag and modified in a normal SI tag. For example, the following grammar
#ABNF 1.0;
mode voice;
root $rootRule;
{
var globalVar = 1;
};
$rootRule =
{ globalVar = 2; } some words { out = globalVar; }
;
would raise an exception when “some words” is uttered. That’s because the first SI tag on line 11 tries to modify a read-only variable (globalVar).
There is of course a way to bypass this limitation. Simply declare a global variable, say GLOBAL that holds an object whose properties will represent the variables you would have liked to be global. To illustrate, here is how the previous grammar would be modified:
#ABNF 1.0;
mode voice;
root $rootRule;
{
var GLOBAL = new Object();
GLOBAL.globalVar = 1;
};
$rootRule =
{ GLOBAL.globalVar = 2; } some words { out = GLOBAL.globalVar; }
;
This time, the grammar will return 2 when “some words” is uttered.
It should be noted that the IBM engine, which supports an old version of the SISR specification, does allow global variables to be modified in SI tags. It is very important to be aware of that when converting grammars initially written for the IBM engine to another engine supporting the latest SISR specification (like, for instance, Loquendo or Nuance 9).