Sound Strategy: Leveraging Mobile Speech Recognition

The arrival of Apple’s Siri personal assistant on iPhone 4S last year raised the profile of speech recognition on mobile devices. And now, the technology is poised to become even more widely embedded.

Siri’s speech capabilities let users employ voice commands to handle tasks such as sending messages, initiating calls and setting up meetings. While Siri bakes speech recognition into the phone and its core functions, more recent industry moves let developers tap the technology.

To wit: In August, Nuance Communications launched Nina, a virtual assistant that adds speech capabilities to iOS and Android apps. Nina — which includes voice biometrics in addition to speech recognition — offers an open SDK to enable developers to integrate those capabilities into their mobile apps. Nuance also offers Dragon Mobile SDK, which lets app developers voice-enable iOS (4.0 and higher), Android (2.1 and higher) and Windows Phone (7.5 at press time) apps. In November, Nuance stated that more than 13,000 developers had utilized the SDK.

In another vote for voice, Intel’s Perceptual Computing SDK 2013 Beta, released in October as a free download, lets developers add speech recognition to apps along with other features such as 2D/3D object tracking. The SDK is geared toward apps running on second and third generation Intel Core processor-based Ultrabooks and PCs. [Disclosure: Intel is the sponsor of this content.] 

Third-party moves such as these complement those of the platform makers: Apple’s previously mentioned Siri and Google’s Android SDK, which lets developers integrate speech.

The Growth of Speech Recognition in Mobile Apps

Overall, the stage has been set for the expansion of speech recognition on mobile devices. Chris Silva, mobile analyst at Altimeter Group, a research and advisory firm, credits Apple and Google for setting the current pace. Apple, he says, sweetened the pot with Siri, noting that Google also provides voice recognition capabilities for native phone functions.

“Apple and Google are the major forces that are not just advancing voice recognition technology, but advancing voice as a means to interact with our phones, PCs and tablets,” Silva says.

Microsoft has also had a strong role in voice technology, Silva notes, citing the company’s effort to unify Windows Phone and its Xbox gaming console as an example. The company provides an app that turns a phone into a voice-enabled remote control for browsing media content on an Xbox.

Spurring Speech Recognition Development

App developers will be the ones to broaden the scope of voice-enabled mobile apps beyond a phone’s basic capabilities. But they may need more than SDKs to do so.

Adding speech recognition to apps remains a complex task once developers get outside of the typical set of commands understood by devices, Silva says. He expects to see libraries of voice commands and actions as well as voice-centric middleware that will help developers take advantage of the technology.

Silva suggests that the evolution of push notifications on mobile platforms parallels voice recognition. Most developers didn’t pursue push notification capabilities right away when they became available for iOS and Android.  However, Urban Airship stepped into that particular gap, enabling developers to build push messaging for iOS, Blackberry or Android devices via an API.

“It took Airship to provide a set of libraries and tools to harness the new interaction methods and the same thing will happen with voice,” Silva says.

The Current Mobile Speech Recognition Landscape

That said, companies are beginning to bring speech recognition capabilities to more and more mobile apps. Ask.com, for instance, integrates Nuance’s speech recognition technology into its iOS and Android apps. This linkage lets users “use their voice to ask and answer questions as well as comment on answers,” says Erik Collier, vice president of product at Ask.com.

Specifically, Ask.com uses Nuance’s Dragon Mobile SDK for iOS. For Android, the company uses the built-in functions provided in the Android SDK. Collier says both Nuance and Android integrations are straightforward, noting the company hasn’t faced any issues implementing them. “We’ve seen good traction with our users leveraging the voice recognition feature and we’ll continue to monitor for signals of further enhancements.”

Indeed, this nascent technology sector will keep industry executives and market analysts anticipating developments. For Silva, the future involves an unfolding of events that will eventually lead to what he terms the “sentient world.”

Silva sees this progression beginning with a wider support for voice across handsets and tablets, continuing with the rise of voice recognition tools and middleware for developers, and, finally, moving to the emergence of autonomous interaction between users and devices. In the latter phase, he says, sensors on mobile platforms will sample the air for context and automatically present users with information.

“It will take a while to get there, but I think voice is one of the first dominos to fall in a journey to a much more natural interaction with our technology,” Silva says.