Modern data platforms like Databricks enable organizations to process massive volumes of batch and streaming data—but scaling reliably requires more than just compute power. It demands data observability: the ability to monitor, validate, and trace data through its lifecycle.
This blog compares two powerful tools—Delta Live Tables and Great Expectations—that bring observability to life in different but complementary ways. Delta Live Tables (DLTs) provide built-in enforcement and lineage within Databricks pipelines, while Great Expectations (GX) offers deep validation and anomaly detection.
In my experience, Delta Live Tables and Great Expectations are better together. Together, they form a robust observability stack, enabling teams to deliver trusted, production-grade data pipelines across batch and streaming workflows.
I’m not a fan of taking both sides of an argument. Let’s look at our core responsibilities as data engineers from the ground up and follow the solutions where the requirements take us.
A data asset is a managed, valuable dataset. A valuable dataset is not just data—it is data with purpose, invested in through processes and controls, and justified by the business value provided. A managed dataset is actively governed, monitored, and maintained to ensure it delivers sustained value to stakeholders.
Fundamentally, a data asset is considered managed when under governance. Proper data governance, such as Unity Catalog-managed datasets, has at least these fundamental characteristics.
Ownership & Stewardship | Who is responsible for maintaining the data asset? Who can answer questions about it? |
Access Control | Who can read, write, or modify this data? Are permissions aligned with roles and rules? |
Lineage | Where does this data come from? What transformations has it gone through? |
Compliance & Privacy | Is sensitive data (e.g., PII, PHI) addressed? Are retention and masking policies enforced? |
Auditability | Can we trace who accessed or modified the data and when? |
Unity Catalog is the foundation for a well-managed lakehouse. I have written about migrating to Unity Catalog and highlighted some bonus features. If you have not migrated yet, I recommend getting started immediately and then focusing on data value.
The business will communicate the value of a data asset primarily through a Service Level Agreement (SLA). The SLA defines agreed-upon expectations around reliability, performance, and quality.
Reliability describes the resiliency, freshness, and correctness of a data asset.
Performance describes the efficiency of data from ingestion to processing to consumption.
Quality describes the degree to which data meets defined rules, expectations, and standards.
These business expectations are fulfilled by IT through People, Processes, and Technology.
For a price.
Service Level Objectives (SLOs) represent operations domains within the SLAs and can be progressively met. This concept will help to align cost to value within your budget. The dials being tuned here are the Software Development Lifecycle (SDLC) and the Databricks Medallion Architecture. SLAs define commitments for reliability, performance, and quality, and SLOs enforce those commitments throughout the data lifecycle. Each layer strengthens one or more of these domains in the Medallion architecture. Across the SDLC, IT teams progressively validate and enforce these guarantees to ensure production-grade data assets.
The workspace is the primary environment for working with data assets in Unity Catalog. Value is typically proportional within the layers from left to right.
SLA Domain | Dev | Test | Prod |
---|---|---|---|
Reliability | Monitor source connectivity and pipeline triggers | Validate pipeline scheduling, retries | SLAs ensure on-time delivery for consumers |
Performance | Baseline performance benchmarks | Load testing, profiling | Optimize for SLAs: query latency, data delivery speed |
Quality | Create GE/DQX test suites | Enforce checks with alerts | Blocking rules and alerting on quality failures |
The catalog is the primary unit of data isolation in the Databricks data governance model. Value is typically proportional within the layers from right to left.
SLA Domain | Bronze (Raw) | Silver (Cleaned) | Gold (Curated) |
---|---|---|---|
Reliability | Data lands on time; raw source integrity is monitored | DLT jobs run consistently; schema evolution is managed | Timely delivery of business-critical data |
Performance | Ingest processes optimized for load handling | Transformations are performant; no bottlenecks | Dashboards and queries load quickly |
Quality | Basic data profiling and source rule checks | DQ rules (e.g., null checks, constraints) enforced | Golden datasets meet business expectations for data quality |
Data becomes a true asset as it progresses through these layers, accruing value while incurring costs to meet increasing SLA expectations.
We are back to where we started, with a little more context. Great Expectations (GX) is focused on data validation and profiling, while Delta Live Tables (DLT) handles schema enforcement and transformations. While DLTs may not have sophisticated rule and profiling capabilities, their native integration to Unity Catalog allows for Its performance characteristics to be similar across both batch and streaming, while GX can struggle with streaming from a performance perspective.
The exercise of defining the progression of value across the SDLC and Medallion Architecture now pays dividends. DLTs stand out for end-to-end data management with automatic lineage management and schema evolution. Great Expectations can then be run as a separate process for more advanced data quality checks and profiling. This could be incorporated as part of a more advanced CI/CD process or just managed manually.
The key is not to focus on a tool in isolation with the idea of picking a winner. I believe most developers could become cross-trained on both technologies. Neither should be outside the scope of a junior data engineer. People are not a problem. I wish that DLTs were integrated with Great Expectations so I didn’t need to have two technologies, but a little Process goes a long way to resolve that Technology issue.
Integrating Delta Live Tables and Great Expectations within the Software Development Lifecycle (SDLC) and the Medallion Architecture helps teams reduce operational costs while continuously delivering business value.
This hybrid approach supports robust data engineering practices and empowers organizations to scale with confidence, optimize their cloud spend, and maximize the return on data investments.
Contact us to learn more about how to empower your teams with the right tools, processes, and training to unlock Databricks’ full potential cost-consciously.
]]>