Databricks supports a wide range of compliance standards to meet the needs of highly regulated industries, including:
- HIPAA (Health Insurance Portability and Accountability Act)
- PCI-DSS (Payment Card Industry Data Security Standard)
- FedRAMP High & Moderate
- DoD IL5
- IRAP (Australia)
- GDPR (EU)
- CCPA (California)
However, I was surprised to read that Databricks Serverless workloads are not covered for PCI-DSS (Databricks PCI DSS Compliance | Databricks) and became curious about the reason behind it. Based on my research, I have convinced myself of the reason and would like to share it here.
To begin with, let’s understand different Databricks SQL Warehouse types and their capabilities,
Pro SQL Warehouse | Classic SQL Warehouse | Serverless SQL Warehouse |
|
|
|
Databricks SQL (Classic/Pro)
- In Databricks SQL (Classic/Pro) warehouses, compute resources in the customer account will be leveraged.
- When running workloads using Databricks SQL (Classic/Pro), data is processed by the compute resources, which are managed by the customers
- Customers will have more control and monitoring over the compute resources
- Data getting processed will also reside within the network boundary of the customer cloud account
Databricks SQL (Serverless)
- In Databricks SQL (Serverless) warehouse, compute resources in the Databricks account will be leveraged.
- Serverless compute operates on a multi-tenant architecture, where compute resources are shared across different customers
- Compute resources are entirely managed by Databricks, and customers will have less control and monitoring ability over the networking and compute resources.
- Different workload data is processed within the compute resources of the Databricks account.
- Though customers have less control over the compute, they can greatly benefit from the capabilities that Serverless warehouses exhibit
Final View
- PCI-DSS requires strict isolation of environments handling cardholder data, which is difficult to guarantee in a shared setup
- It mandates restricted and monitored network access, especially for systems handling payment data
- It requires fine-grained control and auditing, which is more feasible in dedicated or customer-managed environments.
- Databricks recommends using classic or pro clusters with dedicated VPCs, private networking, and enhanced security controls for PCI DSS-compliant workloads.
- Additionally, Databricks dedicates effort to bring in more isolation boundaries within Serverless compute