So CRN had an article about Amazon Web Services going down this past Monday. While that is certainly news, especially to a small number of people, I want to focus on total impact rather than the big scary word called outage. Our reality in the cloud is that high availability cannot be achieved in the same way you would achieve it in your own datacenter. Indeed, you could make the argument that you cannot get even 98% uptime from the cloud. But let’s take a look at some of the numbers courtesy of CRN and do some back of the napkin calculations. Let’s also just look at one Amazon Datacenter.
- October 22nd – three hour outage in the Northern VA data center
- June 14th- six hour outage in the Northern VA data center
- April 21, 2011 – “several hour” outage in in the Northern VA data center
so let’s calculate this just by total downtime like we would a normal datacenter. This is three outages in 18 months and one day. That should come out to 8942 hours over that time. If we assume that the April 21 outage was six hours then we come up with 15 hours of downtime. 15/8942 = .001677477 or .167% of downtime. That equates to about 99.943% uptime.
Avoid Contact Center Outages: Plan Your Upgrade to Amazon Connect
Learn the six most common pitfalls when upgrading your contact center, and how Amazon Connect can help you avoid them.
Now lets further take into account that the entire data center and the thousands of hosted sites using machine images, storage, database and other services never completely went down. That means that some like Pinterest have seen outages but with a higher uptime than 99.943%. Other may not have seem any downtime at all. For arguments sake, lets say the average downtime is probably 99.99%. Lets call that our baseline.
I realize that high availability is about more than hardware and core services. Many of these sites rely on a good set of core services for storage, data retrieval and authentication so while it’s not perfect, we can relate AWS to more than just hardware. However, that doesn’t take into account software issues for the individual sites which can cause outages outside the purview of AWS. but if a company uses this cloud AND it’s core services to build a site and then builds out multiple images and builds them out in more than just Amazon’s Virginia datacenter then we could easily see uptime start from a base of 99.99% and climb even higher.
I’m beginning to think that my original supposition about having to accept lower downtime for using cloud based services is incorrect. Most of my clients are happy with 99.9% uptime.
Here’s my bottom line, while the headline makes for good fun, it looks like Amazon’s cloud services bring a higher level of stability than previously thought. Of course, this is all just back of the napkin. What have I missed?