Tip

How to add voice recognition features to the Echo device

Have you thought about taking Amazon's Echo device for a test drive? Expert Barry Burd walks you through the process of developing apps for Echo.

This article is the second in a two part series. In this article, Barry Burd shows you how to program with Amazon Echo. To get an overview of Amazon Echo and learn about how it works, take a look at the first article: Introducing Amazon Echo

To start developing for Amazon Echo, first sign into the Amazon Developer Console. When you do, you're taken to a page with a large list of Amazon Web services (Figure 1). In the upper right corner, you'll see the name of a region, such as N. Virginia, N. California or Tokyo. No matter where you live, click on that name. In the resulting drop-down box, set the region to N. Virginia. North Virginia is currently the only region that handles Lambda requests for the Alexa Skills Kit.

Figure 1: A list of Amazon Web servicesFigure 1: A list of Amazon Web services. (Click to view larger.)

On the same page, in the huge list of AWS services, click the item labeled Lambda. In Figure 1, Lambda is the second item from the top in the leftmost column.

The next webpage that you see lists any Lambda functions that you've created (Figure 2).

Click the page's Get Started Now button or Create a Lambda Function button.

Figure 2: A list of Lambda functionsFigure 2: A list of Lambda functions. (Click to view larger.)

You create a Lambda function in four steps, each one performed on a separate webpage.

  • In Step 1, you select a blueprint for your Lambda function. The first page lists some possible blueprints (Figure 3). Click the alexa-skills-kit-color-expert option. Doing so creates code to help you start developing your sample Alexa skill. Click Next to proceed to the Step 2 page.
  • On the Step 2 page, you select a source for events that trigger the execution of your Lambda function (Figure 4). Select Alexa Skills Kit, and then click Next.
  • On the Step 3 page, you supply information about your new Lambda function (Figure 5). For the Runtime, select Node.js. For the Code Entry Type, leave the default Edit Code Inline button checked.

Figure 3: Blueprints to choose fromFigure 3: Blueprints to choose from. (Click to view larger.)

I suggest making a small change in the text area containing the code -- a change that will be noticeable when you test the new function. For example, if you scroll down in the text area, you'll see:

function getWelcomeResponse(callback) {
var sessionAttributes = {};
    var cardTitle = "Welcome";
    var speechOutput = "Welcome to the Alexa Skills Kit sample, "
                + "Please tell me your favorite color by saying, "
                + "my favorite color is red";
    var repromptText = "Please tell me your favorite color by saying, "
                + "my favorite color is red";
    var shouldEndSession = false;

Change "Welcome to the Alexa Skills Kit sample" to "Welcome to the future of voice enabled technology". Or change "Please tell me your favorite color" to "Please tell me your innermost secrets". When you test the skill on a real Echo device, the change in wording assures you that Echo is running the function that you created.

Figure 4: Selecting a source for eventsFigure 4: Selecting a source for events. (Click to view larger.)

On the Step 3 page, accept the default values in the Handler, Memory and Timeout fields. In the required Role field, select Basic Execution Role -- which seems to be the same as lambda_basic_execution.

Figure 5: Defining a new Lambda functionFigure 5: Defining a new Lambda function. (Click to view larger.)

Note: When you select a Role, your browser will try to create a pop-up (Figure 6). At this point, you must have pop-ups enabled in your browser. Otherwise, you can't get past this Role selection point. In the pop-up, simply click Allow.

Figure 6: Be sure to enable pop-upsFigure 6: Be sure to enable pop-ups. (Click to view larger.)

On the Step 4 page, you confirm the choices that you made on the previous three pages (Figure 7). Click Create Function.

Figure 7: Confirm choices you've made Figure 7: Confirm choices you've made. (Click to view larger.)

Figure 8 shows you a page that describes your new Lambda function. Notice that the function has one event source; namely, Alexa. Notice also that the function has been assigned a unique Amazon Resource Name (ARN), beginning with arn:aws:lambda:us-east-1. Copy that ARN for future reference.

Figure 8: Describing a new Lambda functionFigure 8: Describing a new Lambda function. (Click to view larger.)

Your next task is to register the new Lambda function as a skill on the developer portal. To do so, go to Amazon's developer site. Select Apps and Services at the top of the page, and Alexa in the row of options immediately below the top row (Figure 9). In the body of the page you see two choices: Alexa Skills Kit and Alexa Voice Service. Click the Get Started button for the Alexa Skills Kit.

Figure 9: Registering a new Lambda functionFigure 9: Registering a new Lambda function. (Click to view larger.)

The next page lists any Alexa skills that you've already registered (Figure 10). If you're new to Amazon Echo, the list is empty. The Lambda function that you just finished creating isn't yet on the list. To add that function to the list, click the page's Add a New Skill Button.

Figure 10: A list of Alexa skills already registeredFigure 10: A list of Alexa skills already registered. (Click to view larger.)

The next page, titled Create a New Alexa Skill, has four parts. Notice the box containing four items on the left side of Figure 11. When you practice developing toy Alexa skills, you do only three of the four parts:

  • In the first part, you supply information about your new Alexa skill. One piece of information is the Invocation Name -- a phrase that an Echo user utters when asking the device to perform the skill. For example, with the Invocation Name in Figure 11, the user starts by saying "Alexa, Ask about favorite colors." Also shown in Figure 11, select the Lambda ARN option -- as opposed to HTTPS -- for the new skill's Endpoint. In the Endpoint text field, paste the ARN string that you copied from the page shown in Figure 8. When you've filled in all the Skill Information fields, click Next.
  • In the second part you declare the skill's Intent Schema and Sample Utterances (Figure 12). In the Intent Schema's text area, enter the following JSON code:
{  
   "intents":[  
      {  
         "intent":"MyColorIsIntent",
         "slots":[  
            {  
               "name":"Color",
               "type":"LITERAL"
            }
         ]
      },
      {  
         "intent":"WhatsMyColorIntent",
         "slots":[  

         ]
      }
   ]
}

Figure 11: Creating a new Alexa skillFigure 11: Creating a new Alexa skill. (Click to view larger.)

In Alexa's world, an intent is like a method call, and an intent schema is like a collection of method signatures. Each intent that's declared in the schema describes the intent's name and the intent's slots. The intent's slots are like a method's parameters. In case you're wondering, Amazon's documentation makes it clear that an Alexa intent is not related to the similarly named Android SDK intent.

Figure 12: Declaring a skill's Intent Schema and Sample UtterancesFigure 12: Declaring a skill's Intent Schema and Sample Utterances. (Click to view larger.)

This example's JSON code declares two intents: one named MyColorIsIntent, with a LITERAL type slot, and another named WhatsMyColorIntent, with no slots. The alternatives to LITERAL are NUMBER and DATE. With a NUMBER slot, when the user says "ten," the code receives the value 10. With a LITERAL slot, when the user says "ten," the code receives the word "ten."

In the Sample Utterances text area, enter the following lines:

MyColorIsIntent my color is {dark brown|Color}
MyColorIsIntent my color is {green|Color}
MyColorIsIntent my favorite color is {red|Color}
MyColorIsIntent my favorite color is {navy blue|Color}
WhatsMyColorIntent whats my color
WhatsMyColorIntent what is my color
WhatsMyColorIntent say my color
WhatsMyColorIntent tell me my color
WhatsMyColorIntent whats my favorite color
WhatsMyColorIntent what is my favorite color
WhatsMyColorIntent say my favorite color
WhatsMyColorIntent tell me my favorite color
WhatsMyColorIntent tell me what my favorite color is

These lines tell Alexa what kinds of sentences to expect from the user. For example, to trigger the MyColorIsIntent, the user might say "My color is green" or "My favorite color is light grey." In the Sample Utterances text area, words like red and navy blue are only place holders for words that the user might say.

When you've filled in the Intent Schema and Sample Utterances areas, click Next.

  • In the third part, you test your skill on your Echo device (Figure 13). With your device registered to your Amazon developer account, you can have the following conversation:

You: Alexa, ask about favorite colors.

Alexa: Welcome to the Alexa Skills Kit sample. Please tell me your favorite color by saying "my favorite color is red."

You: My favorite color is magenta.

Alexa: I now know your favorite color is magenta. You can ask me your favorite color by saying, "What's my favorite color?"

You: What's my favorite color?

Alexa: Your favorite color is magenta. Goodbye.

Figure 13: Testing your skill with EchoFigure 13: Testing your skill with Echo. (Click to view larger.)

Notice that Alexa accepts color names that aren't explicitly in the Sample Utterances text. For example, if I say "My favorite color is refrigerator," Echo says "I now know your favorite color is refrigerator." Echo also accepts the three-word phrase "book cat flashlight" as my favorite color.

To see the code behind the conversation about colors, look over the following JavaScript code. For presentation in this article, I distilled this code mercilessly from the original Lambda Function Code -- the code whose first 18 lines appear in Figure 5.

function getWelcomeResponse(callback) {
    var sessionAttributes = {};
    var speechOutput = "Welcome to the Alexa Skills Kit sample, "
          + "Please tell me your favorite color by saying, "
                + "my favorite color is red";
}


function onIntent(intentRequest, session, callback) {
    var intent = intentRequest.intent,
        intentName = intentRequest.intent.name;

  
    if ("MyColorIsIntent" === intentName) {
        setColorInSession(intent, session, callback);
    } else if ("WhatsMyColorIntent" === intentName) {
        getColorFromSession(intent, session, callback);
    } 
}


function setColorInSession(intent, session, callback) {
    var favoriteColorSlot = intent.slots.Color;
    var speechOutput = "";

    if (favoriteColorSlot) {
        favoriteColor = favoriteColorSlot.value;
        speechOutput = "I now know your favorite color is " 
                + favoriteColor + ". You can ask me "
                + "your favorite color by saying, what's my favorite color?";
    }
}


function getColorFromSession(intent, session, callback) {
    var favoriteColor;
    var speechOutput = "";

    if(session.attributes) {
        favoriteColor = session.attributes.favoriteColor;
    }

    if(favoriteColor) {
        speechOutput = "Your favorite color is " 
                + favoriteColor + ", goodbye";
    }
}

You might also want to see one of the request/response pairs involved in your conversation with Echo. To do so, return to the page in Figure 2 and click the newly created MyColorFunction row. When you do, you'll see a page like the one Figure 14 shows. On that page, select Configure Sample Event from the Actions drop-down list.

Figure 14: Viewing request/response pairsFigure 14: Viewing request/response pairs. (Click to view larger.)

In response to your selection, the website shows you a pop-up like the one Figure 15 shows. This pop-up contains the code for a request:

{
  "session": {
    "new": false,
    "sessionId": "session1234",
    "attributes": {},
    "user": {
      "userId": null
    },
    "application": {
      "applicationId": "amzn1.echo-sdk-ams.app.[unique-value-here]"
    }
  },
  "version": "1.0",
  "request": {
    "intent": {
      "slots": {
        "Color": {
          "name": "Color",
          "value": "blue"
        }
      },
      "name": "MyColorIsIntent"
    },
    "type": "IntentRequest",
    "requestId": "request5678"
  }
}

When you say "My favorite color is blue," Echo sends this JSON code to the cloud. In the pop-up Figure 15 shows, you can change blue to magenta, to refrigerator, or even to gibberish text.

Figure 15: Altering response/request codeFigure 15: Altering response/request code. (Click to view larger.)

When you click the Submit button in Figure 15, you see a response of the following kind from the cloud:

{
  "version": "1.0",
  "sessionAttributes": {
    "favoriteColor": "blue"
  },
  "response": {
    "outputSpeech": {
      "type": "PlainText",
      "text": "I now know your favorite color is blue. You can ask me your favorite color by saying, what's my favorite color?"
    },
    "card": {
      "type": "Simple",
      "title": "SessionSpeechlet - MyColorIsIntent",
      "content": "SessionSpeechlet - I now know your favorite color is blue. You can ask me your favorite color by saying, what's my favorite color?"
    },
    "reprompt": {
      "outputSpeech": {
        "type": "PlainText",
        "text": "You can ask me your favorite color by saying, what's my favorite color?"
      }
    },
    "shouldEndSession": false
  }
}

When you talk to your Echo device, the device plays a text value from this JSON response on its speaker.

Baby steps

It's not very useful to have a device tell me my favorite color, especially if I named the color a few seconds earlier. But simple Hello World examples don't have to be useful. Instead, they have to be dirt simple, and they have to demonstrate the structure of the solution to more complex problems. The color skill that's available on Amazon's AWS site does just that.

Every developer knows the feeling that a particular app might never run. So, it's a special sense of elation when the app actually runs. That's the feeling I had when I got this simple skill working on my home Alexa device. I talked to the device, the device exchanged information with the cloud, and then the device talked back. My wife tested the skill and, at the end of the session, felt compelled to say "thank you" to Alexa.

For a developer, the next steps are to generalize from the simple "What's my favorite color?" code, and to try some alternatives to the Lambda/Node.js combination. For example, you can develop for Echo in Java instead of Node.js. You can host your skill as a Web service on your own Internet-accessible endpoint.

Who knows? Eventually, you might be saying "Alexa, increase speed to Warp 10."

What has your experience been with voice recognition applications? Let us know.

Next Steps

IT ponders voice recognition for work

Mastercard voice recognition achieves success

Follow Cameron McKenzie on Twitter: @potemcam
Follow Barry too: @allmycode

Books penned by Barry Burd:

Java For Dummies 
Android Application Development All-in-One For Dummies 
Beginning Programming with Java For Dummies 
Java Programming for Android Developers For Dummies

Dig Deeper on Software development best practices and processes