Sound Strategy: Leveraging Mobile Speech Recognition

The arrival of Apple’s Siri personal assistant on iPhone 4S last year raised the profile of speech recognition on mobile devices. And now, the technology is poised to become even more widely embedded.

                                                         

Siri’s speech capabilities let users employ voice commands to handle tasks such as sending messages, initiating calls and setting up meetings. While Siri bakes speech recognition into the phone and its core functions, more recent industry moves let developers tap the technology.

To wit: In August, Nuance Communications launched Nina, a virtual assistant that adds speech capabilities to iOS and Android apps. Nina -- which includes voice biometrics in addition to speech recognition -- offers an open SDK to enable developers to integrate those capabilities into their mobile apps. Nuance also offers Dragon Mobile SDK, which lets app developers voice-enable iOS (4.0 and higher), Android (2.1 and higher) and Windows Phone (7.5 at press time) apps. In November, Nuance stated that more than 13,000 developers had utilized the SDK. 

In another vote for voice, Intel’s Perceptual Computing SDK 2013 Beta, released in October as a free download, lets developers add speech recognition to apps along with other features such as 2D/3D object tracking. The SDK is geared toward apps running on second and third generation Intel Core processor-based Ultrabooks and PCs. [Disclosure: Intel is the sponsor of this content.] 

Third-party moves such as these complement those of the platform makers: Apple’s previously mentioned Siri and Google’s Android SDK, which lets developers integrate speech.

The Growth of Speech Recognition in Mobile Apps
Overall, the stage has been set for the expansion of speech recognition on mobile devices. Chris Silva, mobile analyst at Altimeter Group, a research and advisory firm, credits Apple and Google for setting the current pace. Apple, he says, sweetened the pot with Siri, noting that Google also provides voice recognition capabilities for native phone functions.

“Apple and Google are the major forces that are not just advancing voice recognition technology, but advancing voice as a means to interact with our phones, PCs and tablets,” Silva says.

Microsoft has also had a strong role in voice technology, Silva notes, citing the company’s effort to unify Windows Phone and its Xbox gaming console as an example. The company provides an app that turns a phone into a voice-enabled remote control for browsing media content on an Xbox.

Spurring Speech Recognition Development
App developers will be the ones to broaden the scope of voice-enabled mobile apps beyond a phone’s basic capabilities. But they may need more than SDKs to do so.

Adding speech recognition to apps remains a complex task once developers get outside of the typical set of commands understood by devices, Silva says. He expects to see libraries of voice commands and actions as well as voice-centric middleware that will help developers take advantage of the technology.

Silva suggests that the evolution of push notifications on mobile platforms parallels voice recognition. Most developers didn’t pursue push notification capabilities right away when they became available for iOS and Android.  However, Urban Airship stepped into that particular gap, enabling developers to build push messaging for iOS, Blackberry or Android devices via an API.

“It took Airship to provide a set of libraries and tools to harness the new interaction methods and the same thing will happen with voice,” Silva says.

The Current Mobile Speech Recognition Landscape
That said, companies are beginning to bring speech recognition capabilities to more and more mobile apps. Ask.com, for instance, integrates Nuance’s speech recognition technology into its iOS and Android apps. This linkage lets users “use their voice to ask and answer questions as well as comment on answers,” says Erik Collier, vice president of product at Ask.com.

Specifically, Ask.com uses Nuance’s Dragon Mobile SDK for iOS. For Android, the company uses the built-in functions provided in the Android SDK. Collier says both Nuance and Android integrations are straightforward, noting the company hasn’t faced any issues implementing them. “We’ve seen good traction with our users leveraging the voice recognition feature and we’ll continue to monitor for signals of further enhancements.”

Indeed, this nascent technology sector will keep industry executives and market analysts anticipating developments. For Silva, the future involves an unfolding of events that will eventually lead to what he terms the “sentient world.”

Silva sees this progression beginning with a wider support for voice across handsets and tablets, continuing with the rise of voice recognition tools and middleware for developers, and, finally, moving to the emergence of autonomous interaction between users and devices. In the latter phase, he says, sensors on mobile platforms will sample the air for context and automatically present users with information.

“It will take a while to get there, but I think voice is one of the first dominos to fall in a journey to a much more natural interaction with our technology,” Silva says.

Mobile Apps and Accessibility

When it comes to making information technology accessible to people with disabilities, websites have received much of the attention.

In 1998, Congress amended the Rehabilitation Act of 1973, directing federal agencies to make IT resources accessible to both government employees and the public. A considerable chunk of the work under the amendments -- typically referred to as Section 508 -- involves getting government websites to comply. Soon after Section 508, the World Wide Web Consortium published its Web Content Accessibility Guidelines, a set of recommendations for improving the accessibility of web content.

Recently, mobile app development has also started coming into the accessibility discussion. Developers and accessibility experts now say that the general approaches used in the web world can also apply to the rapidly expanding field of mobile devices and apps. Mobile OS makers -- including Apple, RIM/BlackBerry and Google -- even offer specific guidance on developing accessible mobile apps.

Forms of Accessibility and How to Integrate Them

Accessibility in app design may take a number of forms. For blind and low-vision users, assistive technologies include screen readers. Screen reader software, such as Apple’s VoiceOver, translates the information appearing on a display to speech. A screen reader may also drive a braille display, which raises dots through holes in a keyboard-like device to permit reading.

Examples of accommodations for deaf and hearing-impaired users include captioning services such as Purple Communications’ ClearCaptions, which debuted in 2011. In the mobile category, the service is available for Android devices and iPhones.

Mobile app developers can help widen the scope of mobile apps disabled people can use in addition to those purpose-built accessibility technologies. And the task of building accessible apps doesn’t have to be tremendously time consuming, notes Doug Brashear, mobile practice director at NavigationArts, a web and application design and development consulting firm. That’s particularly the case when the mobile OS has accessibility features baked in.

“Surprisingly, the current crop of mobile devices, particularly iPhones, has more accessibility features built into the operating system than you’d ever expect,” Brashear says. “A small amount of additional design and development time -- over what is normally required -- can yield a highly usable and accessible app.” Apple iOS’ accessibility features, for example, can get developers 75 percent of the way there, according to Brashear.

Crista Earl, director of web operations at the American Foundation for the Blind, also notes Apple’s accessibility features. Among the major capabilities: VoiceOver and Zoom. VoiceOver, she notes, originated as a screen reader for the Mac plaform and later migrated to iOS. Zoom lets users magnify an app’s entire screen as opposed to individual elements, according to Apple. Earl also says that Android, as open source software, enables app makers to develop accessible apps or apps geared toward niche markets.

Accessible App Design Tips

Many accessibility principles for websites also readily apply to mobile development. Section 508 guidelines, for instance, call for text labels to accompany images and navigational controls such as buttons. Screen readers can’t interpret a button without the supplemental text. “Put an explicit label on your controls,” Earl advises.

Similarly, web accessibility guidelines declare that information shouldn’t be conveyed only as color -- as in the case of distinguishing various subway lines on a map. Text labels provide an alternative method for conveying information here as well.

Much of what Brashear’s company would do to build a mobile app’s user interface, he says, would be the same steps it would take to create a website that complies with Section 508 or the Americans with Disabilities Act. But some elements of accessible development don’t carry over from websites to apps.

“There are a whole set of things specific to mobile because of the screen size and the fact they are touchable,” Brashear says. He suggests developers adhere to the standard UI elements for a given platform, which he says greatly aids the intuitiveness of an app. The idea is to let users leverage the experience they have had with other apps. The more customized the app, “the harder it is going to be, especially for a sight- challenged person, to understand,” Brashear says.

Developers should also enable landscape viewing as an accessibility practice, suggests Brashear, who notes that some apps lock the orientation to be portrait only. He says landscape mode is helpful in providing a bigger view, overall, and for facilitating the use of a virtual keyboard.

Brashear also cites the following mobile app accessibility recommendations:

·         Keep the need to enter text to a minimum, since small or virtual keyboards can be difficult to use.

·         Locate actions in your app away from areas of the screen that perform other functions.

·         Provide large finger targets for on-screen buttons or links.

Know Your Audience

Understanding users is central to any app project. When developing for accessibility, app makers need to “understand the nature of the challenges involved,” Brashear says

To that end, he advises developers to read, research and learn from people with accessibility needs. He points to forums such as AppleVis, a website designed for blind and low-vision users of iPhones, iPads and other Apple products.

Consulting disabled users is also important as the app moves through the development cycle. “When testing is being done, work with people who have the disabilities that you want to serve,” says Nancy Massey, president of MasseyNet.com Inc., a company which consults on accessibility and Section 508 issues.

Developers often tend to make accessibility more complicated than it needs to be, says Massey, who adds that there’s ample crossover between general usability and accessibility in the mobile technology field. An app built with a clear and simple design that’s attractive to users may go a long way toward meeting accessibility goals, she says. “What makes something user friendly often makes it accessible." 

Has UI and UX Innovation Plateaued?

Different strokes for different folks. That’s trite but true when it comes to how people interact with smartphones and tablets. That description also sums up a big challenge for app developers, OS vendors and device manufacturers: designing a UI that each one believes is the ideal way to interact with a device while still accepting the fact that many users will be confused or prefer an alternative.

Case in point: Steve Jobs famously dissed the stylus as a lousy substitute for the finger. But if most iPad -- and tablet -- users agreed, there wouldn’t be such a healthy market for Bluetooth keyboards. It’s easy to assume people buy these add-ons because they don’t like typing on glass, but that’s not the only reason.

“On many devices and within many apps, having a soft keyboard means not having the full real estate of the screen available,” says Daria Loi, Intel user experience (UX) innovation manager. “I have a never-ending list of users who report frustrations with their soft keyboard covering part of the screen.” [Disclosure: Intel is the sponsor of this content.]

UI and UX preferences also vary by culture, giving developers, OS vendors and device manufacturers another set of variables to accommodate. “I recently interviewed users in Japan who love using the stylus on their tablet or convertible as it enables touch without fingerprints,” Loi says. “Users in India told me that they love the hand for some applications but the stylus for others -- in particular more complex apps such as Illustrator, Photoshop, 3D Studio Max and so on.”

One UI to Rule Them All?
Whether it’s touch, speech control or even eye tracking, a break-the-mold UI has to be intuitive enough so that users aren’t daunted by a steep learning curve. With Metro, Microsoft takes that challenge to another level. Its new UI spans multiple device types and thus multiple use-case scenarios.

“All meaningful change requires learning,” says Casey McGee, senior marketing manager for Microsoft’s Windows Phone division. “The key is to expose people to as many relatable scenarios as possible and make learning an enjoyable and rewarding process of discovery. Microsoft is doing that by using similar UI constructs across several consumer products, phones, PCs and game consoles.”

Metro is noteworthy in part because if consumers and business users embrace it, then developers can leverage their work across multiple device types. “Windows Phone and Windows 8 in particular are both part of one comprehensive set of platform offerings from Microsoft, all based on a common set of tools, platform technologies and a consistent Metro UI,” McGee says. “The two use the same familiar Metro UI, and with Windows Phone 8, are now built on the same shared Windows core. This means that developers will be able to leverage much of their work writing applications and games for one to deliver experiences to the other.”

Metro gives developers a single framework for designing user experiences, with a set of controls for each form factor. “Developers building games based on DirectX will be able to reuse substantial amounts of their code in delivering games for both Windows and Windows Phone,” McGee says. “Developers building applications using XAML/.NET will be able to reuse substantial amounts of their business logic code across Windows and Windows Phone.”

Speak Up for a Better UX? Or Look Down?

The growing selection of speech-controlled apps and UIs -- including Google Now, Siri and Samsung’s S Voice -- shows that some developers and vendors believe that voice is a viable alternative to the finger, at least for some tasks. When people can simply say what they want, the theory goes, it’s less daunting and confusing that having to learn and remember that, say, two fingers swiped in a circle zooms in the page.

But simply adding speech-control features doesn’t automatically guarantee an intuitive user experience. In fact, when implemented poorly, it can make the user experience worse.

One common pitfall is assuming that users will stick to industry lingo rather than using vernacular terms. For example, a travel app that expects users to say “depart” instead of “leave” might frustrate them by responding with: “Not recognized. Try again.” That’s also an example of the difference between simple speech recognition and what’s known as “natural language understanding”: The former looks for a match in its database of terminology, while the latter tries to understand the user’s intent.

“The correct answer for voice is to get more and more intelligent like humans so that you don’t have to get the right word,” says Don Norman, co-founder of the Nelson Norman Group, a consultancy that specializes in UX.

Eye tracking is another potential UI. It could be a good fit for messier applications, such as turning pages in a cookbook app instead of coating the tablet or smartphone screen with flour. But like voice and touch, eye tracking will see myriad implementations as the industry casts about for the best approach.

“Eye tracking is an interesting thing,” Loi says. “I can see usefulness and some interesting usages. My only concern is that we, the industry, might fall in love blindly with it and start to believe it can do anything. Do you remember the initial attitude toward gestural controls? All those unlikely UIs to be navigated through unlikely new gestural languages?  Sometimes the industry gets very excited about a new technology, and the first reaction is to squeeze it into every device and context, without carefully considering why and what for exactly.”

In the case of voice, the user experience could get worse before it gets better simply because there’s a growing selection of solutions that make it relatively easy for developers to speech-enable their apps. That freedom to experiment means users will have to contend with a wide variety of speech UI designs.

“I consider this an exciting, wonderful period,” Norman says. “But for the everyday person trying to get stuff done, it will be more and more frustrating.”


Photo: Corbis Images