As I mentioned in a previous post, on August 27 at SpeechTEK in New York City, I will be giving a SpeechTEK University course entitled Advanced Speech Application Tuning Topics. I thought it might be worthwhile for me to give a bit more detail about some of the specific topics I’ll be talking about.
So here are a few highlights:
- The “out-of-grammar” challenge - No matter what we do, users say things we didn’t anticipate. And, unfortunately, that happens quite a lot. It’s the harsh reality with which most speech applications have to deal and how we manage this challenge has a huge impact on success rate and user experience. I’ll present some of the most effective techniques we have been using to make sure that the application performs as optimally as possible in real conditions (i.e., dealing with real users).
- Are confidence scores good enough? - Confidence scores are essential in order to decide when to accept, reject, or confirm a speech recognition result. Unfortunately, confidence scores produced by recognition engines are often quite suboptimal, leading to unnecessary confirmations and dialog failures. We’ll show that it’s possible to get much better confidence scores.
- Identify problems with discriminative grammar weights - It’s well known that grammar weights can be automatically trained to learn the relative frequency of grammar alternatives. It’s not as well known that training discriminative weights can be an effective way to identify problems in a grammar. We’ll talk about this.
- Know where to focus - With limited amounts of time allocated to tuning, it’s important to be able to focus where tuning will have the biggest payback. We’ll talk about different techniques that help us find where the biggest problems are - and therefore, where improvements will have the largest impact.
- Confidence thresholds - Not long ago, someone on the Yahoo Voice User Interface Designers group complained about some application being too ‘confirmation happy’. But what’s the best way to determine confidence thresholds in a given dialog? As a matter of fact, what are good dialog-level performance metrics? We’ll show how dialog simulations can help us find thresholds that optimize your favorite performance metrics. We’ll also show how we can improve performance by using thresholds that depend on the recognition result.
- Rule-based expansion of phonetic pronunciations - Optimizing phonetic pronunciations is one of the most effective ways of improving speech recognition accuracy. Finding words that have recognition problems and fixing their phonetic pronunciations can bring large improvements. But how do you tune pronunciations for a 20,000-word vocabulary, especially when most of that vocabulary won’t even find its way into the tuning corpus? We’ll show how rule-based pronunciation expansion can bring surprising improvements.
These are just some of the topics I’ll be talking about. In the meantime, I’d be interested to hear about your ideas or experiences on these, or any other topic related to speech application tuning.