Have you ever wondered how Microsoft provides an SLA of 99.95% for Azure? 99.95% is quite high, it means that the Azure service is available the entire year with only 4 hrs and 23 minutes of downtime. How does Microsoft guarantee this level of service?
Fault Domains
A fault domain is a physical point of failure. Think of a computer (or a rack of servers) that is physically plugged in to a power outlet in one location. If a power outage happens, that computer goes offline. If a flood happens at the datacenter, the computer goes offline. If a missile hits the building, the computer goes offline. You get the picture.
When creating a new instance, Azure will automatically place that instance in a new Fault Domain. This ensures that if you have 2 instances of a service, they cannot be in the same fault domain. If that missile did hit the building, you are covered and your service will remain up.
Upgrade Domains
Whereas Fault Domains are a physical separation, Upgrade Domains are a logical separation. Upgrade domains exist so when Microsoft rolls out a new software feature or bug fix, each upgrade domain is upgraded at different times. This ensures that if you have at least 2 instances, your service will never go down as the result of an upgrade.
Azure services can have up to 5 upgrade domains. When you create a new service instance, Azure automatically places it in the next upgrade domain. If you have more than 5 instances, 7 for example, upgrade domains 0-1 will have 2 instances and upgrade domains 2-4 will have 1 instance.
Moral of the story – always deploy a minimum of 2 instances of your service to ensure the Microsoft SLA of 99.95%. For more information, contact us at Perficient and one of our certified Azure consultants can help you envision your Azure solution today!