Within Amazon Connect we can build engaging contact flows that use Amazon Polly to prompt callers with text to speech utterances. Amazon Polly produces natural sounding speech using deep learning technologies. This is not your old-school and often cringe worthy “robot” voice.
With that said, let’s look at a few scenarios where we can delight callers by tweaking how Polly speaks certain key items. To do this, we will use Speech Synthesis Markup Language (SSML). Don’t worry, the acronym is probably the hardest part of SSML.
Your account number is fifty-one thousand eight hundred thirty-nine…
Let’s say our contact flow uses Lambda to look up the caller’s account number and we want to confirm that we found the right one. For our first attempt we set the prompt value in a Get customer input node to “Is your account number $.Attributes.customerNumber?” A caller’s account id is 51839 and the caller is prompted with “Is your account number fifty-one thousand eight hundred thirty-nine?” We’d like the caller to hear all those digits pronounced separately.
At this point we could enter a long cycle of tweaking the Get customer input node, saving and publishing the contact flow and then calling back in to test. Instead, we can go over to the Polly console for our AWS account (https://console.aws.amazon.com/polly/home/SynthesizeSpeech) and have a much tighter testing loop.
Once we’re at the Polly console, we select the “SSML” tab and copy and paste our prompt. Polly doesn’t know about our contact attributes here, so we’ll replace $.Attributes.customerNumber with 51839. We can’t quite press “Listen to speech” yet to hear the result though. SSML is similar to XML and requires an enclosing parent speak tag. So our full input is “<speak>Is your account number 51839?</speak>”
SSML lets us specify that each character in a string be read out individually using the say-as tag with an attribute interpret-as of “characters”. There is a similar attribute value of “digits”, but let’s stick to “characters” to handle alpha-numeric account codes as well.
Go ahead and change the input string to include the say-as tag and we end up with: “<speak>Is your account number <say-as interpret-as=”characters”>51839</say-as>?</speak>”
Much better right? Now just take that input string and copy it into the Get customer input node, making sure to select the “SSML” option from the “Interpret as” dropdown.
Let’s say we want to present the caller with a menu of options via DTMF or even better, a Lex bot. Or first attempt at a prompt is “How can we assist you today? Would you like to check your most recent order, create a new order or speak to an agent?” Hurry back to the Polly console and take a listen. Might be nice to have a uniform pause between each option.
Option one is the Oxford comma, so a comma after “create a new order” and before the or. If the pause for the comma still seems a bit fast, option two is to use SSML to insert pauses exactly as long as we want.
For that we use the break tag with an attribute time with the pause value in milliseconds. So to pause for under half a second per item, we get: “<speak>How can we assist you today? Would you like to check your most recent order <break time=”400ms”/> create a new order <break time=”400ms”/> or speak to an agent?</speak>”
For our last example, we want to cycle through some promotions as a caller is in queue. Today we’re offering some free candy with large orders. In our customer queue flow we have a prompt “If you place a large order with us today, we will include a free box of our classic caramel candy at no charge to you”. Delicious.
I forgot to mention our company is based in southern Wisconsin, and we have pretty strong opinions on how to pronounce the word caramel (https://english.stackexchange.com/questions/372583/why-do-north-americans-pronounce-caramel-as-carmel). We drop that middle “a” and so should our contact center.
SSML and Polly have us covered. We can use the phoneme tag to supply a phonetic pronunciation. Phonetic alphabets are tricky, so I got some help from a transcription site online (http://lingorado.com/ipa/) to get the International Phonetic Alphabet version of “carmel”. Our updated prompt looks like: “<speak>If you place a large order with us today, we will include a free box of our classic <phoneme alphabet=”ipa” ph=”kɑrˈmɛl”>caramel</phoneme> candy at no charge to you</speak>” and our callers are hearing it the way we like.
For full documentation on SSML and Amazon Connect, check out the developer page on AWS at: https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html
Thanks for reading. Any questions, comments or corrections are greatly appreciated. To learn more about what we can do with Amazon Connect, check out Helping You Get the Most Out of Amazon Connect