While working on a grammar conversion project from Nuance GSL to SRGS ABNF, I stumbled upon a few grammars all having the same design problem: using optional parts to make a few words repeat a varying number of times. This is a pattern we’ve observed regularly on various projects.
Here is an example of such a grammar for recognizing sequences of 4 to 8 digits (I omitted the semantic tags for clarity):
#ABNF 1.0 ISO-8859-1; mode voice; language en-US; root $digits4To8; public $digits4To8 = $digit $digit $digit $digit [$digit] [$digit] [$digit] [$digit] ; ...
The original GSL grammar looked like this:
Digit4To8 ( Digit Digit Digit Digit ?Digit ?Digit ?Digit ?Digit )
The GSL syntax does not support the <N-M> syntax like in ABNF to repeat an expansion from N to M times. That’s a reason why the grammar was written this way in the first place. In ABNF grammar , it would have instead been written as:
#ABNF 1.0 ISO-8859-1; mode voice; language en-US; root $digits4To8; public $digits4To8 = $digit <4-8> ; ...
In GSL, it would have been better to write the grammar as:
Digit4To8 ( Digit Digit Digit Digit ?Digit1To4 ) Digit1To4 ( Digit ?Digit1To3 ) Digit1To3 ( Digit ?Digit1To2 ) Digit1To2 ( Digit ?Digit )
Both grammars are equivalent, right? So what’s the problem?
Ambiguities
Well, both grammars recognize the same language (the same set of sentences), but the first grammar has a very different behavior. It is highly ambiguous. That means some sentences can be parsed in two or more different ways. See what you get when you interpret one such sentence in NuGram IDE:
The interpreter tells us (at the top-left of the window) that there are 6 different parses for the sentence. (I’ve seen grammars generating more than 100 parses for a given sentence!).
The problem with ambiguous grammars is they can impact both recognition accurary and recognition performance. Suppose a grammar covers a sentence that is highly ambiguous and another sentence which is not, but is phonetically close to the former. Since speech recognition engines limit their recognition search space, it is possible that the latter be pruned from the search space at the beginning of the recognition window even if it’s the one that would come up with the best score at the end of the recognition.
The other problem is recognition performance. All semantic tags are typically executed at the end of the recognition process, once the user has finished talking. If there are lots of identical hypotheses with the same score, the recognition engine will have to execute all tags (interpreted ECMAScript code), most of them being redundant and useless, thus causing longer delays in the speech application.
Determining that a grammar is ambiguous (or not) is a very hard problem (it’s an undecidable problem). That means, whatever tool you use that’s supposed to decide for ambiguities will inevitably make mistakes. But that doesn’t mean there are no tools available to help detecting ambiguities. For instance, NuGram IDE will tell you if there are two or more different parses for a given sentence. And the sentence generator tool can also be configured to detect sentences that are ambiguous at the semantic level (sentences producing two or more different semantic values).


4 Comments
I think you’ll find a considerably simpler way to restructure the grammar is as follows (and for the reasons you outline this is how it should have been written in the first place):
Digit Digit Digit Digit ?(Digit ?(Digit ?(Digit ?(Digit))))
AKA:
$digit $digit $digit $digit [$digit [$digit [$digit [$digit]]]]
This puts the search network ‘exactly how it should be’ I reckon.
i.e. Digit 5 is optional,
then you can only have an (optional) Digit 6 following Digit 5,
And you can only have an (optional) Digit 7 following Digit 6,
…
…
(This completely ignores the Australian problem of “double two triple three” being a common phraseology – That creates a whole new problem and mandates a structure similar to the one you indicated in fact)
Cheers, and Hello from Australia.
P.S. Are you sure it’s an undecidable problem in this context? Not convinced.
Only if it has infinite loops/generation (Which is another very BAAAAAD idea in grammars)
If it’s not infinite, then you can always just do an exhaustive generate, and check every phrase for multiple parses.
That may not be practical for many non-trivial grammars (Billions of paths?), but I believe that puts most (all?) well structured (non-SLM) speech recognition grammar ambiguities in the decidable realm.
A more intelligent network traversal could probably find duplicates quite easily…
A classic, real-human linguistic grammar is probably where a real undecidable problem exists.
Or then again, I may not sufficiently understand what an undecidable problem is…
Who knows.
Hi Peter,
You are correct. Ambiguity detection is undecidable for context-free grammars in general. And most speech recognition engines only support regular grammars (no general recursion, although some kinds of loops), for which the ambiguity problem is decidable.
But decidable does not always mean easy to do. It may be easy to answer the question of ambiguity with a yes or no, but explaining why is not. As you pointed out, exhaustive “generate and test” is sometimes a good idea, but that’s not practical for non-trivial grammars. NuGram IDE provides a sophisticated sentence generation tool that we hope will eventually be used as the basis for an ambiguity detection tool.