Creating an Alexa Custom Skill on a Web Service

The short version: there is no short version. Implementing server side functionality inevitably involves a fair amount of legwork. Also, an advance warning: I’m a novice at the programming language I chose to implement my web service in – Java. For this reason, I’m only going to post code snippets for problems you need to solve. There may be better ways of doing this. Finally, I’m leaving my Skill permanently in development mode: I don’t intend to publish. While it’s something I find useful, this was primarily a learning exercise for me. If you’ve ended up here via Google, hopefully you’ll find something you can re-purpose.

I’ll come to the Amazon side of things in a bit. First up, you will need a server to host your web service. You have a couple of options here [as an alternative to Lambda]: first, you can host on your home network. If you have DHCP, the typical process is to configure your broadband router for DDNS, and have a port forwarding rule on to your server. I decided to take another approach, simply because I wanted to try it out. I’ve paid for a vm with a hosting outfit called OVH. They have a really nice setup, so I’ve spent less than £10 for a server for 3 months. I’ll probably keep it on permanently as I like the flexibility.

I decided to implement my web service as a servlet. There are plenty of tutorials on how to set up the stub methods [and all the various libraries and files structures you will need when you come to export a War file]. I used Eclipse as my IDE. My testing cycle – which I imagine is pretty common – was to run tomcat on my laptop where I was doing the development. I could then build and deploy to that locally, and then test the interface by POSTing data to it with the Chrome plugin Postman. When I had something I wanted to deploy to my vm, I exported the WAR file, scp’ed it to the OVH server’s web apps directory, removed the old War [and expanded directory] and restarted.

A couple of foundational steps on the remote server. The OVH setup is a vanilla install of your operating system of choice – I went with Debian. As well as installing your server software [tomcat] and its requirements [Java], you’ll need to create a user for it. As Amazon requires the server to run on a privileged port, you’ll need to take this into account. As you really won’t want to run the server as root, I googled around for a snippet of C code which I could adapt and compile, which could use setuid to start my server as another user. This is something to take care over if you intend to run your service in anger, and you have no choice: you have got to run your service on port 443.

The next step on the server side, related to that privileged port, is TLS. Because I tried a ‘real’ CA, which I thought was on Amazon’s supported list [Let’s Encrypt], I also went to the trouble of registering a name for my vm’s IP. If you’re going to stay in developer mode, you may not need to do this. When I realised I was getting TLS handshake errors, I changed the config and went with an openssl self-signed certificate.

Turning to the configuration set up on the Amazon side, it’s generally pretty well documented, but there are a couple of exceptions to this. First, in the Skill Information, you have to provide an Invocation Name. I know of one other developer who made the same mistake as me because of the way the debugging pop-up information is worded: I assumed that the invocation string needed to start with ‘ask’, or equivalent. It doesn’t. If you just want your skill to be invoked with ‘widget counter’, you just have those two words, and then kick it off on your Echo with ‘Alexa, ask widget counter’, followed by one of your sample utterances.

This seems incredibly obvious, but there is a warning on the testing page which states, ‘Please complete the Interaction Model tab to start testing this skill’, I went off on a wild goose chase thinking I’d done something else wrong. That warning is arguably one of the misleading features of the whole orchestration interface that Amazon provides. It stays there permanently, no matter what you do.

There are some good examples of the Intent Schema, and other parts of the interaction model already out there but one more can’t hurt. Before I get into it, a quick explanation of what my skill does. My server side code ‘screen scrapes’ a train operator’s web site every 10 minutes. I then parse the rendered HTML looking for specific information about service conditions. That information forms the basis of the response.

Intent Schema:

{
   "intents":[
      {
         "intent":"TrainTimes",
         "slots":[
            {
               "name":"TrainServices",
               "type":"LIST_OF_SERVICES"
            }
         ]
      }
    ]
}

The custom slot looks like:

LIST_OF_SERVICES Peterborough | Luton | Cambridge | Brighton

and then some sample utterances:

TrainTimes how is {TrainServices}
TrainTimes how is the {TrainServices} service
TrainTimes how is the {TrainServices} line

[Brief aside: why ‘train times’, when it’s not timing related? This is a vestige of trying to navigate the mistake I made with the invocation name. I have a strong accent and so was trying all sorts, so at one point it was called ‘big dog’ :)].

The rest of the Amazon interface is pretty straightforward, such as the uploading of self signed certificate [if that’s the approach you choose to take]. One final step before moving away from the interface is the testing. The example question, ‘how is the peterborough service’ gets transformed into the following blob of JSON:

{
  "session": {
    "sessionId": "SessionId.[GUID here]",
    "application": {
      "applicationId": "amzn1.ask.skill.[another GUID here]"
    },
    "attributes": {},
    "user": {
      "userId": "amzn1.ask.account.[long account string here]"
    },
    "new": true
  },
  "request": {
    "type": "IntentRequest",
    "requestId": "EdwRequestId.[one more GUID here]",
    "locale": "en-GB",
    "timestamp": "2017-02-06T13:43:37Z",
    "intent": {
      "name": "TrainTimes",
      "slots": {
        "TrainServices": {
          "name": "TrainServices",
          "value": "Peterborough"
        }
      }
    }
  },
  "version": "1.0"
}

You will obviously need to check that your web service is able to parse this. As per my comments on my testing cycle, I copied this into Postman as the raw content of a POST.

I’ll briefly go into some more detail about how my implementation works, because when you get into the business end of parsing the input JSON, you need to map it back to some data to respond with.

The parsing of the train company website as a skill isn’t my first bite at the implementation. I started by trying to write a widget for Android. It’s not as simple as just grabbing the HTML, though: the site uses Ajax to render the results. I ended up using HtmlUnit. Because it’s pretty slow, I implemented a class outside the servlet which uses a combination of a ServletContextListener, and then a ScheduledExectuorService. I got the example code from here to work from. The one gotcha is that your web.xml needs to refer to the class.

Next we get on to the hacky part of passing the information on to the servlet. What I did – having the listener write to a flat file which the servlet reads – is a reflection of my novice levels with Java.

So, returning to the list of general problems to solve: having received the POST data, your service then needs to get hold of the data, and then parse the JSON. For the first part, I tried a few different options and then settled on the getBody implementation I found here.

The parsing of the JSON itself: it’s the first time I’ve ever done this and it feels way more complicated than it should be. I tried a couple of different parsing libraries, but ended up using JSON.simple.

Going back to the JSON data from the Amazon test interface, you have two consecutive objects, the session and the request. For an industrial strength implementation, there are things you need to do with the session data [I’ve also skipped over the TLS client certificate validation, which Amazon stipulates. While perfectly sensible for security purposes, it’s not necessary to get your development implementation up and running.

So the request itself contains a series of embedded objects that you need to unpack, in turn, until I finally get to the slots data. Your implementation will vary, but mine looks like this:

        JSONParser parser = new JSONParser();
        try {
            JSONObject obj = (JSONObject) parser.parse(postData);
            JSONObject jrequest = (JSONObject) obj.get("request");
            log.warning("request: " + jrequest.toString());
            JSONObject intent = (JSONObject) jrequest.get("intent");
            log.warning("intent: " + intent.toString());
            JSONObject slots = (JSONObject) intent.get("slots");
            log.warning("slots: " + slots.toString());
            JSONObject trainSvces = (JSONObject) slots.get("TrainServices");
            log.warning("trainSvces: " + trainSvces.toString());
            String trainSvceVal = (String) trainSvces.get("value");
            log.warning("trainSvceVal: " + trainSvceVal);
            resultString = trainSvceVal;
        } catch (ParseException e) {
            log.warning("ParseException error: ");
            ...
        }

Again, for a ‘proper’ implementation, you are going to have to test each of the sub-objects. My code is taking a leap of faith :).

Having lost the will to live with the parsing, I have to admit that, having figured out which piece of the train-related data the web service needs to respond with, I then print the JSON straight out in the required format.

        response.setContentType("application/json;charset=UTF-8");
        PrintWriter out = response.getWriter();
        out.println("{");
        out.println("  \"response\": {");
        out.println("      \"outputSpeech\": {");
        out.println("         \"type\": \"PlainText\",");
        out.println("         \"text\": \"" + resultsString + "\"");
        out.println("    },");
        out.println("    \"shouldEndSession\": true");
        out.println("    }");
        out.println("}");

So, not exactly a thing of beauty, but it works.

If I have any better ideas for future skills to implement, I’ll start with the certificate that Amazon needs. There’s an outfit called StartSSL which it looks like they accept, but which requires email based validation on an address corresponding to the domain. I started looking at setting up PostFix for this – one for another day.

The-Plot.com

Photography etc…

Creating an Alexa Custom Skill on a Web Service