At SpeechTEK University this summer, Judi Halperin from Avaya and Jenni McKienzie from Travelocity gave a very good introduction to grammar writing. The slides are definitely worth reading. They did a good job at addressing the most common sources of problems with speech recognition grammars.
However, two things struck me in their presentation: (1) They use the SRGS XML Form as the authoring language for speech recognition grammars, and (2) They mention JSP or ASP pages as the most common way of dynamically generating grammars. I’ll keep the latter point for another post, but let me address the first point here.
Having long ago abandoned the XML Form in favor of ABNF in our own practice, we’re always intrigued by the fact that a large number of grammar developers - including expert developers like Judi and Jenni - continue using the XML Form (in the case of Judi and Jenni’s presentation, I can see that for a teaching situation with time constraints they would choose GRXML for the examples since more people are familiar with that format and those that aren’t can read it easily, their choice was certainly a conscious decision). Indeed, there is just no question in our mind that ABNF, being so much more compact, readable, and easier to manipulate than the XML Form, is by far the better choice.
I therefore tried to put my feet in the shoes of those developers using the XML Form and understand their motivations. So here’s what I came up with:
- XML is the native format for the ASR engine. It’s true that some ASR engines - Nuance’s OSR and Nuance 9 in particular - only support the XML Form. It’s also true that support for the SRGS XML format is required by the specification, while support for ABNF is only optional. But there are format converters out there, so even on these platforms, the ABNF format can be used to author the grammar.
- It’s painful having to convert from ABNF to XML all the time. That’s a good point. Many testing tools provided with ASR engines (e.g., parseTool) will require you to convert the grammar to the XML form, which can indeed be painful. This is especially true if conversion tools are not well integrated with the environment in which grammars are being edited.
- XML is the format for all documents in the project. I heard this a few times. Some hard-core developers like XML. But that implies that the VUI designer, the speech scientist, or whoever authors the grammars, actually is a software developer. Quite often, that’s not the case.
- There is no good ABNF editor. I think this is the crux of the problem. Kind of a chicken and egg situation. No one uses ABNF because there is no good editor and no one provides a good ABNF editor because there is no demand for it. At least, with a decent XML editor, you get syntax coloring, code assist based on the document schema, etc. Unfortunately, an XML editor doesn’t know anything about grammars and therefore cannot provide advanced features like syntax checking of semantic tags, or refactoring capabilities (expansion extraction, rule renaming, semantic slot renaming, etc.).
However valid these points might have been at some point, now that there is a complete environment for developing, testing, and debugging recognition grammars in ABNF format (and exporting them to any target ASR engine), I don’t think there is now any remaining reason for not switching to ABNF. Like, immediately.
Am I missing something? Are there other more fundamental reasons I did not see? Let me know!
I am deeply convinced that once you try authoring your grammars in ABNF using NuGram IDE, you won’t want to get back to your old habits of coding grammars in the XML Form. Give it a try! It’s free. And, by the way, remember that more and more speech recognition engines support ABNF natively.