It is quite easy to write a speech recognition grammar. After all, it’s only a text file. And with the help of a good editor, we can expect the grammar to be free of syntax errors, i.e. to conform to the SRGS specification (ABNF or XML).
The real challenge is in making sure that the grammar does not contain any “error” from the point-of-view of the set of sentences it accepts, and the semantic values it associates to these sentences. We also have to make sure that it does not over-generate (accept sentences that are not part of the usual spoken language or cannot be uttered in the context of the question asked).
This post is the first in a series that will show the most common types of errors made when developing grammars and how they can be found and fixed using the advanced features of NuGram IDE. The examples I’ll use are all variants of grammars we’ve seen in the course of our grammar developments projects, either internally or for our customers.
Problem 1: Repeated Tokens
We’ll start this series with a very simple one. Suppose I’m editing a long list of ordered tokens. It is very tempting to copy one of the items and paste it as many times as needed and edit the copies. I do this all the time. It’s such a common pattern in text editing (and programming, unfortunately…) Of course, it is very easy to forget editing one of the copies.
For example, let’s say I want to write a number grammar. I’ll start writing something like:
public $r1To9 =
one {out.number=1} |
and then copy/paste the first item 8 times, replace “one” by the digits “two” to “nine” and do the same for their corresponding semantic value. I’ll get something like:
public $r1To9 =
one {out.number=1} |
two {out.number=2} |
three {out.number=3} |
four {out.number=4} |
five {out.number=5} |
five {out.number=6} |
seven {out.number=7} |
eight {out.number=8} |
nine {out.number=9}
;
Of course, you’ve already seen the error (probably much faster than I have). It’s easy here since you have the offending fragment right before our eyes. But when developing a grammar with many rules, it may not be that obvious. And even carefully reviewing the grammar may not suffice. (How many times do we miss typos in our own texts that another reviewer finds in a matter of seconds?)
So how do we find the problem? In this case, I simply need to build a coverage test set with the sentence generator using the Tags Coverage strategy. The following video shows how to do that:
Of course, in this example, I knew how to proceed to find the problem. In practice, and this will be a recurring idea in the series, a grammar needs to be tested in many different ways. There are many techniques and tools that need to be applied. The Tags Coverage (or the All Paths) generation strategy is often the first we use. It has the advantage of exercising all the semantic actions and finding lots of potential problems very early on in the debugging process.
In my next post in this series, I’ll write about ambiguities, how they affect speech recognition performance, and how to detect and deal with them.
