Lately, I have been involved in a number of grammar conversion projects. This has been a great opportunity to put our process and tools to the test once again. And since every project has its peculiarities, we learn constantly.
The process we outlined about a year ago omitted a number of small details. That was OK for small scale conversion projects. But when you have to deal with much larger projects (with thousands of grammars to convert), these details add up significantly. Let me share some of the issues we face daily.
It’s not just semantic tags
When you have tools to automatically convert semantics tags from one format to another, grammar conversion can seem to be a no-brainer. But reality is not that simple. Grammars are not written for an abstract specification, they are written for a very specific recognition engine. They often contain:
- Words (tokens) that map to very specific pronunciations or that try to model some disfluencies (like hesitations, for instance), but for which the SRGS $GARBAGE rule is more appropriate.
- Multiword duplicates, with one sequence of space-separated words, and a similar sequence of underscore-separated words to allow cross-word phonetization (like “thirty one” and “thirty_one”).
- Words that map to very specific, tuned pronunciations. Such words often have an unusual orthography to make sure they are not confused with real words.
All this means that there are a number of transformations either to the original grammar or to the converted grammars that must be applied. This can be by means of regular expression search&replace, or manually inspecting grammars.
Generation of coverage sets
When dealing with hundreds (if not thousands) of grammars, it is not feasible to create initial coverage test sets manually. This is way too time consuming. That means you have to find a way to generate those initial coverage test sets automatically in batch. But how do you do that?
Fortunately, NuGram IDE already provides sophisticated tools to analyze grammars and generate sentences from them. We just built on this foundation a tool to automatically generate coverage tests sets for a set of ABNF grammars. The tool also reports problems found in the grammars, like the use of digits in voice grammars, or words in DTMF grammars.
The coverage set generation tool uses a combination of configuration and sophisticated analyses to determine how to generate sentences and how many sentences to generate. For example, it’s not possible to generate all sentences from a grammar that covers an infinite number of sentences. When that’s the case (or when the number of sentences covered by the grammar is above a certain threshold), the tool reverts to other generation strategies.
Recognition tests as part of the QA process
Finally, even a syntactically valid grammar may fail to load in the ASR for a variety of reasons, the most common one being a limitation or constraint from the ASR itself. For this reason, we got to the conclusion that doing recognition tests (ideally benchmarking of the converted grammars) is a very useful addition to the QA process. Of course, simply compiling grammars may catch a number of problems. But doing a “before and after” comparison can detect conversion problems that were not caught by the coverage tests when they are not exhaustive.
Another benefit of doing recognition tests is the ability to check the performance of the converted grammars to identify those needing additional work. Some converted grammars may have words that prove difficult to recognize with the new engine because they are not properly phonetized, thus calling for application-specific (or even grammar-specific) phonetic dictionaries.
What about DTMF?
In the specific case of converting GSL grammars to GrXML or ABNF, a complication arises with the presence, in the same grammar, of both DTMF sequences and words. I will discuss this issue in a separate post.