January 25th, 2010

by Dominique Boucher

Voice APIs: back to basics

We definitely live in interesting times. After years of pushing hard on VoiceXML (2.0 and 2.1), the industry comes up regularly with new approaches departing significantly from the newly proposed VoiceXML 3.0. And these approaches sometimes come from companies working hard on the VoiceXML standardization effort.

For instance, last week Voxeo announced a new interface to its Tropo platform, called Tropo WebAPI. To build a communications application, one has simply to write a web service/application producing JSON documents. These documents contain simple instructions for the communications platform like: play this prompt, ask a question, transfer the call, etc. Very simple instructions, indeed. The results are then sent server-side to the application for further processing and deciding what to do next.

This approach reminds me of TwiML, Twilio’s own markup language for implementing voice applications, and (to a certain extent) FastAGI, the Asterisk way of developing server-side voice applications (the preferred way of deploying applications on the Cloudvox platform).

What do these approaches have in common? Well, they all offer a much simpler programming model than VoiceXML. In VoiceXML, there is the form-filling algorithm which tries to fill slots in a form automatically. VoiceXML applications can also contain a fair amount of scripting (in ECMAScript) with many scoping rules for variables. It also provides some exception mechanisms (with catch and throw elements), a root document for storing data, etc. No wonder most development environments targeting VoiceXML platforms only make use of a limited subset of VoiceXML.

In fact, the new approaches are not programming models, they essentially provide low-level instructions for the various voice platforms. Much like a virtual machine. It’s up to the user of the platform to implement its own programming model on top of these instruction sets. And this is a very attractive offer, as this will most certainly ignite the development of new application programming environments and frameworks, some of which will be platform agnostic.

We lived a somewhat similar period at the end of the last century. There were many non-interoperable proprietary IVR platforms, and the industry came up with a solution: VoiceXML. Will we see something similar happen with these new approaches? I doubt it. I think that all these approaches are sufficiently similar that a good abstraction layer on the application side can suffice to support them all easily. In the 90’s, porting an application to a new platform was plainly impossible without a complete rewrite.

Strangely, the programming languages community lived something similar a few years ago. From around 1997 to the start of the century, the craze for  Java almost killed research in the field of object-oriented programming language design not targeting Java or the JVM. Then, in 2003 or so, some leading researchers decided consciously that it was time to start a post-Java era. And it’s at about that time that many programming languages started flourishing and that we saw a greater acceptance for dynamic/scripting languages (on the JVM or not). This period also coincided with the rise of the Web 2.0 and a new culture of entrepreneurship, thanks to Paul Graham Y Combinator.

I think we are living something similar today in the communications industry, though a few years later. We see young entrepreneurs and new startups with innovative ideas enter the market. By the way, a few of them presented their ideas at StartupCamp Telephony last week, an event sponsored by Twilio and PhoneTag as part of the ITExpo conference.

The years to come promise to be very exciting.

7 Responses to “Voice APIs: back to basics”

Voice APIs: back to basics | VXML Solutions January 25th, 2010 - 11:36 am

[...] from Nu Echo Blog Tags: :, APIs, back, basics, Nu Echo Blog, [...]

nshm January 25th, 2010 - 2:35 pm

Thanks for the nice post. I think although VoiceXML adepts like Paolo Baggia, Director of International Standards, Loquendo promote the point that IVR system should be simple like ATM

http://www.youtube.com/watch?v=64S_b7An3p4

For speech interfaces it’s simply not true. The speech itself is not algorithmic, it can’t be described with VoiceXML or more modern tools like ruby-on-rails interfaces. Speech dialog has a lot of cross-references and assumptions. So the IVR should be able to map speech to semantic concepts. And this mapping could be described with examples probably, not with a programming language.

Innovation abound in voice APIs | insideCTI January 25th, 2010 - 2:44 pm

[...] Boucher of Nu Echo makes a good observation in the world of voice APIs: In fact, the new approaches are not programming models, they [...]

Dominique Boucher January 25th, 2010 - 7:36 pm

@nshm Thanks for your comment. I agree with you in the absolute, but most IVR systems simply try to automate business-oriented tasks, for which a directed dialog can be quite efficient. And VoiceXML or Tropo-like systems are well-adapted to that. And even then, it’s not that easy to make those applications perform well. They need a lot of tuning and optimizations.

Jim Rush January 28th, 2010 - 8:22 am

Let me take a different spin on this. Why do we care about standards. In theory, you gain portability and tools with which you might get gains in safety and efficiency.

VoiceXML really didn’t deliver all that well. I’ve had a chance to review apps from different organizations. Few stay fully within the standard. Many will require significant rewrites when migrating platforms. With the exception of some enterprises that maintain multiple platforms, most developers don’t have even the most basic understanding of what features they can count on and which ones they can’t (ECMAScript being one of the challenges here). In practice, IVR lifetimes are 5-15 years. By the end of that cycle, most applications are rewritten for a variety of reasons.

On the tools side, the market showed that there wasn’t a sufficient business in writing VoiceXML tools. All the independent application dev tools had to be acquired by larger companies and are slowly moving away from being open or generally supported. There are a few other tools floating around with different value propositions, but nobody is making a significantly sized business from it.

Don’t even get me started on the state of speech recognition *grin*

The SMB and Enterprise telecom market has always been fairly fragmented. I suspect it will stay that way for some time to come. There’s just not enough money it, given the complexities involved. Each attempt to simplify has been derailed with the worst example of that being VoIP. To date, it looks like VoIP has taken the worst of the TDM and internet standards and creating a very complex and messy playing field.

I think the direction that Voxeo and others are taking are good things. They are simply building things that they believe their customers want. If they are right, it will sell.

Voxeo Developers Corner » Weekend link dump February 14th, 2010 - 1:49 pm

[...] Dominique Boucher looks at the rise of API-oriented voice services (including Tropo) in Back to Basics [...]

Weekend link dump | VXML Solutions February 14th, 2010 - 2:36 pm

[...] Dominique Boucher looks at the rise of API-oriented voice services (including Tropo) in Back to Basics [...]

Leave a Reply