As discussed in a previous post, one of the unfortunate consequences of out-of-grammar utterances is that they can cause many false accepts that may seriously degrade application performance and user experience. In order to illustrate this, let’s use the simple example of a small menu where callers must choose among three options: “validate”, “repeat”, and “cancel”.
We use a test set of 5042 field utterances distributed as follows:
|Menu choice||Number of utterances||Proportion of test corpus
As we can see, this is a fairly clean test set with only about 6% of out-of-grammar utterances. As usual, these include background speech, various noises, side conversations, some common OOG utterances (“yes”, “no”, “okay”, “options”, “oh”, etc.), as well as a wide variety of rambling responses of different kinds.
Naturally, since the grammar can only recognize one of the three keywords (and legitimate variants), most of these OOG utterances are misrecognized as one of the keywords. That wouldn’t be a problem if the corresponding confidence scores were low and we could safely reject them, but that’s not always the case. In fact, many of these have a confidence score over 0.9, resulting in damaging false accepts.
An effective way to reduce false accepts is to add decoys to the grammar. For instance, you would normally want to start by adding common OOG responses, on the ground that it’s easier to reject an OOG utterance if you can recognize it correctly. You could also add more “general” decoys, for instance a phoneme loop, to help reject hard to predict OOG utterances. There are more advanced techniques that can be used in order to come up with “optimal” decoys for a given grammar, but I won’t go into them now.
In all cases, it is of course absolutely necessary to evaluate, on a large enough test corpus, the impact of these decoys since they could easily end up reducing recognition accuracy, sometimes significantly. In particular, one should be careful not to add decoys that could be confused with legitimate sentences or keywords.
Let’s illustrate the impact of decoys using the set of field utterances described above. The graph below compares the performance of a grammar without decoys (red curve) to that of the same grammar to which appropriate decoys were added (blue curve).
As can be seen, even for for a fairly clean test corpus with a low OOG rate, the addition of decoys can significantly improve performance. For instance:
- For a False Accept rate of 0.5%, the Correct Accept rate increases from 95% to over 97.5%, which is equivalent to reducing the error rate by more than 50%.
- For a Correct Accept rate of 97.5%, the addition of decoys decreases the False Accept rate from 1.5% to around 0.3%. That’s one fifth the False Accept rate for the same Correct Accept rate.
Another interesting observation is the impact of decoys on confidence thresholds. Let’s say we want to have a False Accept Rate of 0.5%. Then, we would need to use a confidence threshold of 0.73 for the grammar without decoys, but only 0.25 for the grammar with decoys. That’s quite a difference! This clearly shows that using “default” threshold values may sometimes produce results that are quite inadequate.
All of this once again demonstrates how important it is to pay close attention to out-of-grammar utterances in a tuning process and how decoys can provide an effective tool for containing the negative impact of such utterances on application performance.