Skip to main content

Amazon Web Services

More natural text to speech with SSML and Amazon Connect

Within Amazon Connect we can build engaging contact flows that use Amazon Polly to prompt callers with text to speech utterances. Amazon Polly produces natural sounding speech using deep learning technologies. This is not your old-school and often cringe worthy “robot” voice.

With that said, let’s look at a few scenarios where we can delight callers by tweaking how Polly speaks certain key items. To do this, we will use Speech Synthesis Markup Language (SSML). Don’t worry, the acronym is probably the hardest part of SSML.

Your account number is fifty-one thousand eight hundred thirty-nine…

Let’s say our contact flow uses Lambda to look up the caller’s account number and we want to confirm that we found the right one. For our first attempt we set the prompt value in a Get customer input node to “Is your account number $.Attributes.customerNumber?” A caller’s account id is 51839 and the caller is prompted with “Is your account number fifty-one thousand eight hundred thirty-nine?” We’d like the caller to hear all those digits pronounced separately.

At this point we could enter a long cycle of tweaking the Get customer input node, saving and publishing the contact flow and then calling back in to test. Instead, we can go over to the Polly console for our AWS account (https://console.aws.amazon.com/polly/home/SynthesizeSpeech) and have a much tighter testing loop.

Once we’re at the Polly console, we select the “SSML” tab and copy and paste our prompt. Polly doesn’t know about our contact attributes here, so we’ll replace $.Attributes.customerNumber with 51839. We can’t quite press “Listen to speech” yet to hear the result though. SSML is similar to XML and requires an enclosing parent speak tag. So our full input is “<speak>Is your account number 51839?</speak>”

SSML lets us specify that each character in a string be read out individually using the say-as tag with an attribute interpret-as of “characters”. There is a similar attribute value of “digits”, but let’s stick to “characters” to handle alpha-numeric account codes as well.

Go ahead and change the input string to include the say-as tag and we end up with: “<speak>Is your account number <say-as interpret-as=”characters”>51839</say-as>?</speak>”

Amazon Web Services - Avoid Contact Center Outages: Plan Your Upgrade to Amazon Connect
Avoid Contact Center Outages: Plan Your Upgrade to Amazon Connect

Learn the six most common pitfalls when upgrading your contact center, and how Amazon Connect can help you avoid them.

Get the Guide

Much better right? Now just take that input string and copy it into the Get customer input node, making sure to select the “SSML” option from the “Interpret as” dropdown.

Dramatic pauses

Let’s say we want to present the caller with a menu of options via DTMF or even better, a Lex bot. Or first attempt at a prompt is “How can we assist you today? Would you like to check your most recent order, create a new order or speak to an agent?” Hurry back to the Polly console and take a listen. Might be nice to have a uniform pause between each option.

Option one is the Oxford comma, so a comma after “create a new order” and before the or. If the pause for the comma still seems a bit fast, option two is to use SSML to insert pauses exactly as long as we want.

For that we use the break tag with an attribute time with the pause value in milliseconds. So to pause for under half a second per item, we get: “<speak>How can we assist you today? Would you like to check your most recent order <break time=”400ms”/> create a new order <break time=”400ms”/> or speak to an agent?</speak>”

Candy controversy

For our last example, we want to cycle through some promotions as a caller is in queue. Today we’re offering some free candy with large orders. In our customer queue flow we have a prompt “If you place a large order with us today, we will include a free box of our classic caramel candy at no charge to you”. Delicious.

I forgot to mention our company is based in southern Wisconsin, and we have pretty strong opinions on how to pronounce the word caramel (https://english.stackexchange.com/questions/372583/why-do-north-americans-pronounce-caramel-as-carmel). We drop that middle “a” and so should our contact center.

SSML and Polly have us covered. We can use the phoneme tag to supply a phonetic pronunciation. Phonetic alphabets are tricky, so I got some help from a transcription site online (http://lingorado.com/ipa/) to get the International Phonetic Alphabet version of “carmel”. Our updated prompt looks like: “<speak>If you place a large order with us today, we will include a free box of our classic <phoneme alphabet=”ipa” ph=”kɑrˈmɛl”>caramel</phoneme> candy at no charge to you</speak>” and our callers are hearing it the way we like.

For full documentation on SSML and Amazon Connect, check out the developer page on AWS at: https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html

Thanks for reading. Any questions, comments or corrections are greatly appreciated. To learn more about what we can do with Amazon Connect, check out Helping You Get the Most Out of Amazon Connect

Thoughts on “More natural text to speech with SSML and Amazon Connect”

  1. Hi Peter:
    Thank you for sharing this information.
    I have created a Lex bot and in our amazon connect the “Get customer input node” has the “SSML” option and the text has the attribute like that my text here. as your post.
    It’s working, but always my lambda function is receiving the “outputDialogMode”: “Text”, so I can’t to send back my answer to the client using because it is ignored, only I can send text messages to the client. However when I use other application web with AWS Javascript LEX Post Content API to call the Lex, then I’m receiving the “outputDialogMode”: “Voice” and the SSML speech works correctly and I’m be able to use the and other attributes in Polly.
    I really appreciate whatever help.
    Do I need to set something more?
    Thank you
    Tania

  2. Peter Miller Post author

    Tania,
    Not sure I understand your problem. There’s a problem with the JSON getting passed into your Lambda from the Amazon Connect contact flow? When testing out Polly and TTS through the AWS console it outputs text, yes, but from a contact flow it will be voice.

  3. Thank you Peter for the prompt reply.
    My contact flow call the lexbot (using SSML option), the lexbot call the lambda for (Initialization and validation code hook) and in my lambda I got in the json received outputDialogMode with “Text”

    In other hands:
    When I call to the lexbot from my browser using postContent , the lexbot call my lambda and i got in json the outputDialogMode with “Voice”.

    I saw this link with my problem:
    https://forums.aws.amazon.com/thread.jspa?threadID=266302&tstart=0
    and looks like is a limitation in contact flow integration, but I don’t understand why the Get customer input node has the option SSML then?

  4. Peter Miller Post author

    I’d keep asking for this feature on the forums and through your Amazon contact(s) and hopefully this feature will be added in a future release

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Peter Miller

Peter Miller is a Solutions Architect at Perficient focused on call center solutions including Amazon Connect

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram