Introducing Amazon Echo - Adding voice recognition to your Java programs

Want to add voice recognition to your Java apps? You can do it with Amazon echo.

This article is the first in a two part series. In this article, Barry Burd introduces Amazon Echo, while in the second article, you will learn how to actually code with Echo, adding voice recognition to a Java program.

Amazon Echo is a new entry among intelligent personal assistants. Like the mobile phone assistants -- such as Siri, Google Now and Cortana -- Echo answers questions, follows commands, plays music and does dozens of other things when a user talks to it. But, unlike the mobile phone assistants, Echo is hands-free. The Echo device is a black, cylindrical tower, about 25 cm tall, that lives in your living room and waits for you to activate it by saying its wake word, which is either "Alexa" or "Amazon."

Cool as it is, the Echo device is only one example of the use of Amazon's Alexa Voice Service. With the Alexa Voice Service, a developer can add voice recognition capabilities to any device that has compatible hardware. According to the voice service's developer preview page, any device with a processor, a microphone, a speaker and a network connection has compatible hardware.

In a few years, every coffee pot will have minimally compatible hardware, but don't expect lively conversation from your coffee pot in the very near future. For a first-rate user experience, a hands-free device should be able to hear your voice from 10 feet away, interpret most of the words that you commonly utter and be interruptible. The hardware to perform such magic isn't cheap. A reliable source told me that, at a price of $180, Amazon is probably subsidizing the cost of the Echo device for consumers.

Alexa, are you self-aware?

My first impression of the Echo device is to marvel at its speech recognition capabilities. True, its capabilities aren't new. Mobile phone assistants like Siri also recognize speech. But abilities of this kind never cease to amaze me. Since the 1960s, when I watched Captain Kirk talking to the ship's computer and heard predictions about full speech recognition being right around the corner, I've been watching the pace of progress in this field.

Unlike so many problems that computer professionals deal with, understanding speech requires true artificial intelligence. You can't solve nine-tenths of this problem with a big algorithm, and then patch up the edge cases with some good guessing. Speech recognition requires less-than-obvious heuristics from the get-go. As a consumer, I don't notice progress from year to year in the speech recognition field. That's why I'm so impressed by the progress that's been made over the past several decades.

Look, Ma! No hands!

One surprising advantage of Echo is its hands-free triggering. To make a request, you don't take your phone out of your pocket. You just say "Alexa" or "Amazon" from anywhere within earshot of the device. If I had never tried using the Echo device, and someone told me about this hands-free feature, my first reaction would be skepticism. How difficult is it to take a phone out of your pocket? How spoiled have we become? But when you perform simple actions many times, tiny differences matter. Think about an action that requires one click too many on a frequently visited website.

If you're not in the same room as your Echo, Amazon has an answer. For $30 extra, you can buy a Bluetooth remote. In my house, we keep the Echo device upstairs and the remote downstairs. If I'm downstairs, and I want to tell Echo to "add toothpaste to the shopping list," I press the button on the remote and speak the command. It's not hands free, but I'm not ready to buy a second Echo for my home. When I'm at the supermarket, the shopping list shows up on my smartphone. There's a minor annoyance here. When my phone isn't connected to my household's wireless network, the phone's Echo app complains before showing me the shopping list. On occasion, the app doesn't launch and I have to try a second time.

Privacy concerns

The Echo device sits in your house connected to the cloud, listening 24/7 for its wake word, "Alexa." Who's to say that it's not also listening for utterances such as "I'm thinking about buying a Toyota," or "My bank account's password is 'swordfish,'" or even "I disagree with the government's military policy?"

These are valid concerns, and I respect the people who take them seriously. But, for better or worse, I'm not one of those people. I have no qualms about having Echo in my home. I consider my private life to be fairly boring, and I would be flattered if someone thought I was worth any individual surveillance effort.

Developing for the Echo device

Echo's software development kit (SDK) became publicly available on June 25. Using the SDK, you can add new capabilities -- new Alexa skills -- to the Echo device. To use the SDK, you need an account on Amazon's developer portal and a place to host a cloud-based service. The developer account is free, and you can make 1 million requests per month to Amazon Web Services Lambda for free. As a developer, I don't think I could talk fast enough to test my Alexa skill 1 million times in a month.

AWS Lambda takes almost all the pain out of responding to cloud-based events. Lambda manages the compute resources automatically for you. So, with Lambda, all you have to do is to define the form of a request and the code for the response. You don't have to create a virtual machine.

Echo isn't wedded to Amazon's Lambda. If you want to endure the work of spinning up your own Web service, you have the option of hosting your Alexa skill on AWS Elastic Beanstalk, or on your own Internet-accessible endpoint.

You can create an Alexa skill using either Java or Node.js. I'm most comfortable with Java. But for Echo development, I tried both and Java and Node.js. Believe me -- Node.js is much easier.

You can develop for Echo without a device. The developer website has a section where you can enter a JSON request and see Alexa's JSON response. But this is boring. Besides, without real voice interaction, you don't understand the user experience. So, if you can, get your hands on an Echo device. Create a free Amazon developer account and register the device with your developer account. The device can be registered with only one Amazon account at a time. If you prefer, you can start by registering the Echo with your Amazon customer account. Later, you can change the registration back and forth between your customer and developer accounts.

To learn how to develop for Echo, got to Part II of this article and learm from this tutorial:  How to add voice recognition features to the Echo device

Follow Cameron McKenzie on Twitter: @potemcam
Follow Barry Burd too: @allmycode

Books penned by Barry Burd:

Java For Dummies 
Android Application Development All-in-One For Dummies 
Beginning Programming with Java For Dummies 
Java Programming for Android Developers For Dummies

Next Steps

The Amazon S3 outage was caused by more than just user input error

Dig Deeper on Core Java APIs and programming techniques