As a Databricks Champion working for Perficient’s Data Solutions team, I spend most of my time installing and managing Databricks on Azure and AWS. The decision on which cloud provider to use is typically outside my scope since its already been made by the organization. However, there are occasions where the client is using both hyperscalers already or they have not yet moved to the cloud. It heloful in those situations to be able to advise the client on the advantages and disadvantages of one platform over another from a Databricks perspective. I’m aware that I am skipping over Google Cloud Platform, but tI want to focus on the questions I am actually asked rather than questions that could be asked. I am also not advocating for one cloud provider over another. I am limiting myself to the question of which AWS versus Azure from a Databricks perspective.
Advantages of Databricks on Azure
Databricks is a first-party service on Azure, which means it enjoys deep integration with the Microsoft ecosystem. Identity management in Databricks is integrated with Azure Active Directory (AAD) authentication, which can save time and effort in an area that I have found can be difficult in large, regulated organizations. The same is true of the deep integration with networking, Private Links and Azure’s compliance frameworks. The value of this integration is amplified if the client also uses some combination of Azure Data Lake Storage (ADLS), Azure Synapse Analytics, or Power BI. The Databricks integration with these products on Azure is seamless. FinOps gets a boost in Azure for companies with an Azure Consumption Commitment (MACC) as Databricks’ costs can be applied against that number. On the topic of cost management, Azure spot VMs can be used in some situations to reduce cost. Azure Databricks and ADLS Gen2/Blob Storage are optimized for high throughput, which reduces latency and improves I/O performance.
Disadvantages of Databricks in Azure
Databricks and Azure are tightly integrated when you are staying within the Microsoft ecosystem. Azure Databricks uses Azure AD, role-based access control (RBAC), and network security groups (NSGs). These dependencies will require additional and sometime complex configurations areIf you want to use take a hybrid or multi-cloud approach. Some of these advanced networking configurations require enterprise licensing or additional manual configurations in the Azure Marketplace.
Advantages of Databricks on AWS
Azure is focused on seamless integration with Databricks under the assumption that the organization is a committed Microsoft shop. AWS takes the approach of providing more dials to tune in exchange for greater flexibility. Additionally, AWS offers a broad selection of EC2 instance types, Spot Instance options, and scalable S3 storage, which can result in better cost and performance optimization. Finally, AWS has more instance types than Azure, including more options for GPU and memory-optimized workload. AWS has a more flexible spot pricing model than Azure. VPC Peering, Transit Gateway, and a more granular IAM security controls than Azure make AWS a stronger choice for organizations with advanced security requirement and/or organizations committed to multi-cloud or hybrid Databricks deployments. Many advanced features are released in AWS before Azure. Photon is a good example.
Disadvantages of Databricks in AWS
AWS charges for cross-region data transfers, and S3 read/write operations can become costly, especially for data-intensive workloads. This can result in higher networking costs. AWS also has weaker native BI Integration when you compare Tableau on AWS versus PowerBI on Azure.
Conclusion
Databricks is a strong cloud database on all the major cloud providers. If your organization has already has committed to a particular cloud provider, Databricks will work. However, I have been asked about the differences between AWS and Azure enough that I felt I wanted to get all of my thoughts down in one place. Also, I recommend a multi-cloud strategy for most of our client organizations for Disaster Recovery and Business Continuity purposes.
Contact us to discuss the pros and cons of your planned or proposed Databricks implementation so we can help you navigate the technical complexities that affect security, cost and BI integrations.