Skip to main content

Amazon Web Services

A Short Intro to SSML in Amazon Connect

Auto Attendants & Call Queues

Introduction to SSML tags – what are they?

We’ve all had that experience of phoning into a contact center and hearing a robotic, impersonal voice on the other end. SSML (which stands for ‘synthesized speech mark-up language’) tags can be added into your Amazon Connect Contact flows to customize your speech prompts, adding a more human and realistic touch. We can do this by inserting specific elements that adjust aspects of speech, such as speed, volume and pronunciation.

Basic SSML format:

The basic SSML format is as follows:

<speak>Thank you for calling the Acme Solutions Center. </speak>

To add in an SSML speech block, drag a ‘Play Prompt’ block from the ‘Interact’ menu on the left side of your Amazon Contact Flow. Click on it to open it up for editing. Select ‘Text to Speech’, enter the words you want spoken, ensuring you use the correct <speak> tags above, and then from the ‘Interpret As’ menu, select ‘SSML’. Press ‘Save’.


Elements and attributes:

You can alter the desired behavior of an SSML tag with an element. Some examples of elements include ‘time’, ‘date’, and ‘prosody’.

Elements can have attributes, which serve to further modify the element. For example, an attribute might result in the time being read back at a slower pace, or at a higher pitch:

<speak><prosody pitch=”high”>Thank you for calling the Acme Solutions Center.</prosody> </speak>

In the example above, prosody is the element to be inserted, pitch is the attribute to be modified, and the value of the pitch is high. This means we want the text to be read back in a high pitched voice.

To slow down or speed up speech using SSML:

Inserting pauses of appropriate length helps humanize a caller’s interaction with your contact center. To insert pauses in SSML tags, you can use the <break> element. The length of the pause is defined numerically, using milliseconds or seconds. In the example below, we’ve used a break of 2 seconds (which we write as ‘2s’ and another one of 100 milliseconds (which we write as ‘100ms’).

<speak>Thank you for calling the Acme Solutions Center. <break time="2s"/> In order for us to route your call properly<break time="100ms"/>please listen to the following menu options</speak>

How to emphasize certain words:

You can use the prosody element and the volume attribute to modify how a word is spoken:

<prosody volume=”XXX”>

If we wanted a specific word or sentence within an SSML tag to be emphasized more loudly, we’d write the following:

<speak>Thank you for calling the ><prosody volume=”loud”>Acme Solutions Center.</prosody></speak>

Alternately, you can reduce the volume of specific words or sentences by using the ‘soft’ attribute.

How to pronounce time and date using SSML:

A big challenge with text-to-speech is enabling unique characters, words, and numbers to be read in a particular context. An example of this is having times and dates read back in their correct formats. SSML can help with this.

To help read back unique numbers and characters we will use the <say-as> element. You can modify this element with the <interpret-as> attribute.

The basic model for using the <say-as> element is:

<say-as interpret-as=”XXXXX”></say-as>


Amazon Web Services - Avoid Contact Center Outages: Plan Your Upgrade to Amazon Connect
Avoid Contact Center Outages: Plan Your Upgrade to Amazon Connect

Learn the six most common pitfalls when upgrading your contact center, and how Amazon Connect can help you avoid them.

Get the Guide

Let’s look at time first. Time can exist in many different formats. Let’s say we have a scenario where we want the delivery time of a customer order to be 8 a.m. It can be written in a few different ways:


8 AM

8 o’clock

Eight o’clock

08:00 a.m.

If we only wanted to read back a static time, then we could hardcode it directly into our Amazon Connect flow. We wouldn’t need to worry about using SSML and the <say-as> element. All of the formats above are recognized and read back correctly.

However, if you think about a potential real-life use case, something like a delivery time is probably changing. So it would likely be returned from a lambda function, potentially via a DynamoDB database and would have to be stored as a contact attribute in the format of something like: $External.attributeTimeName. The value returned is what would then need to be interpreted. But if we use the ‘Text’ read back option, Amazon Connect would read back exactly what’s written above ‘$External.attributeTimeName’ instead of the time value.

Luckily, we can use SSML to make sure the actual time value is read back. You would write the following (remember to select the ‘SSML’ option under the ‘Interpret As’ menu before saving):

<speak> You can expect your delivery at <say-as interpret-as="time" format="hms24">$External.attributeTimeName</say-as></speak>

In this case, we would hear 8 a.m.


Now, let’s take a look at a date. We can use 05-22-1975 as an example, but this value is being returned to us from a Lambda function. Similar to the time example above, if we don’t ‘tell’ Amazon Connect how to read back the value, it will simply read back ‘$External.attributeDateName’

<say-as interpret-as=”date” format=”dmy”>$External.attributeDateName</say-as>

In the above example, the date will be spoken back in the “dmy” format.

Similar to time, there are many different formats that dates can be read back in. Some of these include:

dmy: Day-month-year,

ym: Year-month,

yyyymmdd: Year-month-day

We will be diving into date and time and how to return a value from a Lambda function in an upcoming tutorial so watch out for that.

How to read back digits a customer entered on a keypad:

Continuing on with our exploration of how to interpret and read back numbers, let’s think of a real-life scenario: entering digits using the keypad and having them read back to check you entered them correctly. For this example, the digits we are entering are: 6-9-0-4-5-2-8.

If we write that into a ‘Play Prompt’ block, it will interpret it as ‘Six-million-nine-hundred-thousand-and-four-five-hundred-and-twenty-eight.’ That’s not what we want.

To read back a sequence or numbers individually, we can use the ‘telephone’ attribute. Here’s what we would write instead:

<speak>The number you entered is <say-as interpret-as="telephone">6904528</say-as></speak>

If we put this into our Amazon Connect flow, it will read back the numbers as: ‘6-9-0-4-5-2-8’. You can then amend the digits with something like ‘If this is correct, press 1. If not, press 2 to re-enter your number.’


This post is just the beginning of what you can do with SSML tags in Amazon Connect. It’s worth noting that a lot of the examples and notation can be used with Amazon Lex as well. A full list of tags, elements and attributes with instructions on how to use them can be found in Amazon’s great guide.

For information on how Perficient can help you optimize your contact center using Amazon Connect, please get in touch with us.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Amita Parikh

More from this Author

Follow Us