Skip to main content

Cloud

Why Databricks SQL Serverless is not PCI-DSS compliant

Istock 2157176253

Databricks supports a wide range of compliance standards to meet the needs of highly regulated industries, including:

  • HIPAA (Health Insurance Portability and Accountability Act)
  • PCI-DSS (Payment Card Industry Data Security Standard)
  • FedRAMP High & Moderate
  • DoD IL5
  • IRAP (Australia)
  • GDPR (EU)
  • CCPA (California)

However, I was surprised to read that Databricks Serverless workloads are not covered for PCI-DSS (Databricks PCI DSS Compliance | Databricks) and became curious about the reason behind it. Based on my research, I have convinced myself of the reason and would like to share it here.

To begin with, let’s understand different Databricks SQL Warehouse types and their capabilities,

Pro SQL Warehouse Classic SQL Warehouse Serverless SQL Warehouse
    • Supports Photon and Predictive IO
    • Does not support Intelligent Workload Management (IWM)
    • Compute resources present in the user cloud account layer
    • Less responsive warehouses to query the demand
    • Cannot auto-scale rapidly, and startup is ~2-4 min
    • Suitable for custom-defined networking and want to connect to databases within the user network
    • Supports Photon
    • Does not support Predictive IO and Intelligent Workload Management
    • Compute resources present in the user cloud account
    • Provides entry-level performance and is less performant than Pro and Serverless SQL Warehouses
    • Cannot auto-scale rapidly, and startup is ~4 min
    • Suitable to run interactive queries for exploration purposes with entry-level performance
    • Supports Photon, Predictive IO, Intelligent Workload Management
    • Compute resources present in the Databricks cloud account
    • Highly responsive to query demand
    • Rapid auto-scaling and rapid startup time of 4-6 seconds
    • Suitable for time-sensitive ETL, Business Intelligence, and Exploratory analysis use cases

Databricks SQL (Classic/Pro)

Classicprocompute

  • In Databricks SQL (Classic/Pro) warehouses, compute resources in the customer account will be leveraged.
  • When running workloads using Databricks SQL (Classic/Pro), data is processed by the compute resources, which are managed by the customers
  • Customers will have more control and monitoring over the compute resources
  • Data getting processed will also reside within the network boundary of the customer cloud account

Databricks SQL (Serverless)

Serverlesscompute

  • In Databricks SQL (Serverless) warehouse, compute resources in the Databricks account will be leveraged.
  • Serverless compute operates on a multi-tenant architecture, where compute resources are shared across different customers
  • Compute resources are entirely managed by Databricks, and customers will have less control and monitoring ability over the networking and compute resources.
  • Different workload data is processed within the compute resources of the Databricks account.
  • Though customers have less control over the compute, they can greatly benefit from the capabilities that Serverless warehouses exhibit

Final View

  • PCI-DSS requires strict isolation of environments handling cardholder data, which is difficult to guarantee in a shared setup
  • It mandates restricted and monitored network access, especially for systems handling payment data
  • It requires fine-grained control and auditing, which is more feasible in dedicated or customer-managed environments.
  • Databricks recommends using classic or pro clusters with dedicated VPCs, private networking, and enhanced security controls for PCI DSS-compliant workloads.
  • Additionally, Databricks dedicates effort to bring in more isolation boundaries within Serverless compute

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Saravanan Ponnaiah

Saravanan Ponnaiah is a solutions architect working on cloud and data projects. He has 20 years of experience with deep expertise in cloud and data platforms, working in banking and financial, insurance, healthcare, and digital forensics domains. Has architected enterprise data solutions with expertise in Databricks and implemented scalable GenAI solutions.

More from this Author

Follow Us