Archive for July, 2009

July 21st, 2009 No Comments

by Yves Normandin

Advanced Speech Application Tuning Topics

As I mentioned in a previous post, on August 27 at SpeechTEK in New York City, I will be giving a SpeechTEK University course entitled Advanced Speech Application Tuning Topics. I thought it might be worthwhile for me to give a bit more detail about some of the specific topics I’ll be talking about.

So here are a few highlights:

  • The “out-of-grammar” challenge - No matter what we do, users say things we didn’t anticipate. And, unfortunately, that happens quite a lot. It’s the harsh reality with which most speech applications have to deal and how we manage this challenge has a huge impact on success rate and user experience. I’ll present some of the most effective techniques we have been using to make sure that the application performs as optimally as possible in real conditions (i.e., dealing with real users).
  • Are confidence scores good enough? - Confidence scores are essential in order to decide when to accept, reject, or confirm a speech recognition result. Unfortunately, confidence scores produced by recognition engines are often quite suboptimal, leading to unnecessary confirmations and dialog failures. We’ll show that it’s possible to get much better confidence scores.
  • Identify problems with discriminative grammar weights - It’s well known that grammar weights can be automatically trained to learn the relative frequency of grammar alternatives. It’s not as well known that training discriminative weights can be an effective way to identify problems in a grammar. We’ll talk about this.
  • Know where to focus - With limited amounts of time allocated to tuning, it’s important to be able to focus where tuning will have the biggest payback. We’ll talk about different techniques that help us find where the biggest problems are - and therefore, where improvements will have the largest impact.
  • Confidence thresholds - Not long ago, someone on the Yahoo Voice User Interface Designers group complained about some application being too ‘confirmation happy’. But what’s the best way to determine confidence thresholds in a given dialog? As a matter of fact, what are good dialog-level performance metrics? We’ll show how dialog simulations can help us find thresholds that optimize your favorite performance metrics. We’ll also show how we can improve performance by using thresholds that depend on the recognition result.
  • Rule-based expansion of phonetic pronunciations - Optimizing phonetic pronunciations is one of the most effective ways of improving speech recognition accuracy. Finding words that have recognition problems and fixing their phonetic pronunciations can bring large improvements. But how do you tune pronunciations for a 20,000-word vocabulary, especially when most of that vocabulary won’t even find its way into the tuning corpus? We’ll show how rule-based pronunciation expansion can bring surprising improvements.

These are just some of the topics I’ll be talking about. In the meantime, I’d be interested to hear about your ideas or experiences on these, or any other topic related to speech application tuning.

July 17th, 2009 No Comments

by Dominique Boucher

Join us at SpeechTEK 2009

New York is the place to be on August 24-26! It’s the host of SpeechTEK 2009, the world’s biggest speech technology conference and exhibition!

Again this year, Nu Echo will have a very strong presence there, with:

  • a SpeethTEK University course by Yves Normandin on advanced speech application tuning topics at (session STKU-6 at 9am on Thursday);
  • a presentation by yours truly on speech grammar coverage analysis (session C302 at 11:45 am on Tuesday);
  • a booth where we will demonstrate NuGram IDE, our flagship Eclipse-based development environment for speech recognition grammars, and NuBot, a full-featured automated application testing tool for both inbound and outbound IVRs; and
  • a number of very exciting announcements; and
  • of course, several of us from the Nu Echo team, who are looking forward to meeting you all!

Don’t wait any longer! Register online and use the registration code VIPNUE to get a 25% discount on the conference pass or a free exhibit hall pass.

Come see us at booth 513!

SpeechTEK 2009

On August 27, I will be giving a SpeechTEK University course entitled Advanced Speech Application Tuning Topics.

This course will provide a synthesis of the speech application tuning methodology and techniques that we have been using - and continuously enhancing - over the past several years at Nu Echo. In essence, I will be describing the foundations (technical and methodological) of our tuning practice, which has proven so effective at delivering applications with very high success rates.

Even to those of you with significant tuning experience, I believe we will be able to provide a novel and, quite possibly, surprising perspective to this very challenging problem.

Here is the abstract, as it appears in the SpeechTEK program:

This course will teach participants a rigorous, data-driven speech application tuning methodology that will enable them to build robust speech applications that effectively deal with how real users actually behave, not how we would like them to behave. Topics include utterance and dialogue-level performance metrics, managing out-of-grammar utterances, techniques to effectively identify and address performance problems, dealing with multitoken utterances, tuning phonetic dictionaries, computing enhanced confidence scores, setting confidence thresholds, and running dialogue simulations. The presentation will be illustrated by numerous examples and interactive demonstrations using field data from real-life applications.

I am looking forward to seeing you there. And if you can’t make it to the course, please come see us at booth 513. I would be happy to give you a demonstration of some of the tuning tools we are using daily in our speech practice.

SpeechTEK 2009

I will be speaking at SpeechTEK 2009 in August!

Here is the abstract of my talk, entitled The Art and Science of Speech Grammar Coverage Analysis, as it appears on the conference site:

As the speech recognition grammars required by today’s applications become increasingly complex, identifying and fixing grammar coverage problems can become quite challenging. Using real-life examples, this session will provide an overview of some of the best practices and techniques for effective speech grammar debugging and coverage analysis. In particular, we will showcase tools no speech scientist should live without: an interactive sentence explorer and a sophisticated, highly customizable sentence generator.

The talk will mostly focus on the kind of problems we face when developing and maintaining speech recognition grammars for real applications (not toy problems). It will clearly demonstrate that effective tools can really help develop grammars faster, while ensuring greater quality and less maintenance hassles.

Also, this talk will be the opportunity for me to demonstrate NuGram IDE’s improved sentence explorer and its sophisticated sentence generation tool. This is indeed very cool stuff (I can’t wait to demo them)!

See you there!

July 6th, 2009 No Comments

by Dominique Boucher

NuGram IDE HOWTO: handling .gram ABNF grammars

One of our customers recently started using NuGram IDE for the development and maintenance of grammars targeting the IBM speech recognition engine. As is often the case with the IBM engine (and others as well), their ABNF grammars use the “.gram” extension.

NuGram IDE’s default extension for ABNF grammars, however, is “.abnf”, which means that it’s not configured out-of-the-box to work with “.gram” ABNF files. The reason is that the “.gram” extension is often used for binary, compiled grammars.

This can easily be fixed using three simple steps.

Step 1 - Adding a new file association

The NuGram IDE plugin adds to the set of content types already supported by Eclipse, a new content type for ABNF grammars. To associate “.gram” files with the ABNF content type, open the preferences and select the page “General > Content Types”. Then select the “Text > ABNF Grammar File” content type. Finally, click on “Add…” and type “*.gram”. You should see something like:

Content Types Preference Page

Content Types Preference Page

That’s it! Opening a “.gram” file will now properly launch the ABNF editor.

Step 2 - Setting the extension for newly created grammars

Now, if you want to create a new “.gram” grammar using the New ABNF Grammar wizard, you have to set the default extension in the preferences:

Default extension for new ABNF grammars

Default extension for new ABNF grammars

Step 3 - Configuring the project-specific translation rules

To completely support “.gram” grammars, the project translation rules must also be configured appropriately. That’s because all the grammar tools must be also know how to handle references to external grammars whose extensions are “.gram”. By default, references to “.gram” grammars are translated into references to “.grxml” grammars (the reason for this behavior will be explained in a separate post). You thus have to make sure that references to “.gram” grammars are not translated and can be used as is.

To do that, select your grammar project in the Navigator view and open its property page (by pressing Alt-Enter or right-clicking on it and selecting “Properties”). Under the “Grammar Development > External Rule References” category, deselect the rule named “.gram to .grxml” applied at “Runtime”:

Disable .gram -> .grxml translation rule

Disable .gram -> .grxml translation rule

Final note

Although I focused exclusively on grammars with the “.gram” extension, the above steps equally apply to any other extension. So go ahead, adapt NuGram IDE to your own needs and habits!