As a Databricks Champion working for Perficient’s Data Solutions team, I spend most of my time installing and managing Databricks on Azure and AWS. The decision on which cloud provider to use is typically outside my scope since the organization has already made it. However, there are occasions when the client uses both hyperscalers or has not yet moved to the cloud. It is helpful in those situations to advise the client on the advantages and disadvantages of one platform over another from a Databricks perspective. I’m aware that I am skipping over the Google Cloud Platform, but I want to focus on the questions I am actually asked rather than questions that could be asked. I am also not advocating for one cloud provider over another. I am limiting myself to the question of AWS versus Azure from a Databricks perspective.
Advantages of Databricks on Azure
Databricks is a first-party service on Azure, which means it enjoys deep integration with the Microsoft ecosystem. Identity management in Databricks is integrated with Azure Active Directory (AAD) authentication, which can save time and effort in an area I have found difficult in large, regulated organizations. The same applies to deep integration with networking, private links, and Azure compliance frameworks. The value of this integration is amplified if the client also uses some combination of Azure Data Lake Storage (ADLS), Azure Synapse Analytics, or Power BI. The Databricks integration with these products on Azure is seamless. FinOps gets a boost in Azure for companies with an Azure Consumption Commitment (MACC), as Databricks’ costs can be applied against that number. Regarding cost management, Azure spot VMs can be used in some situations to reduce costs. Azure Databricks and ADLS Gen2/Blob Storage are optimized for high throughput, which reduces latency and improves I/O performance.
Disadvantages of Databricks in Azure
Databricks and Azure are tightly integrated within the Microsoft ecosystem. Azure Databricks uses Azure AD, role-based access control (RBAC), and network security groups (NSGs). These dependencies will require additional and sometimes complex configurations if you want to use a hybrid or multi-cloud approach. Some advanced networking configurations require enterprise licensing or additional manual configurations in the Azure Marketplace.
Advantages of Databricks on AWS
Azure is focused on seamless integration with Databricks, assuming the organization is a committed Microsoft shop. AWS takes the approach of providing more dials to tune in exchange for greater flexibility. Additionally, AWS offers a broad selection of EC2 instance types, Spot Instance options, and scalable S3 storage, which can result in better cost and performance optimization. Finally, AWS has more instance types than Azure, including more options for GPU and memory-optimized workloads. AWS has a more flexible spot pricing model than Azure. VPC Peering, Transit Gateway, and more granular IAM security controls than Azure make AWS a stronger choice for organizations with advanced security requirements and/or organizations committed to multi-cloud or hybrid Databricks deployments. Many advanced features are released in AWS before Azure. Photon is a good example.
Disadvantages of Databricks in AWS
AWS charges for cross-region data transfers, and S3 read/write operations can become costly, especially for data-intensive workloads. This can result in higher networking costs. AWS also has weaker native BI Integration when you compare Tableau on AWS versus PowerBI on Azure.
Conclusion
Databricks is a strong cloud database on all the major cloud providers. If your organization has already committed to a particular cloud provider, Databricks will work. However, I have been asked about the differences between AWS and Azure enough that I wanted to get all my thoughts down in one place. Also, I recommend a multi-cloud strategy for most of our client organizations for Disaster Recovery and Business Continuity purposes.
Contact us to discuss the pros and cons of your planned or proposed Databricks implementation. We can help you navigate the technical complexities that affect security, cost, and BI integrations.