Modern data platforms like Databricks enable organizations to process massive volumes of batch and streaming data—but scaling reliably requires more than just compute power. It demands data observability: the ability to monitor, validate, and trace data through its lifecycle.
This blog compares two powerful tools—Delta Live Tables and Great Expectations—that bring observability to life in different but complementary ways. Delta Live Tables (DLTs) provide built-in enforcement and lineage within Databricks pipelines, while Great Expectations (GX) offers deep validation and anomaly detection.
In my experience, Delta Live Tables and Great Expectations are better together. Together, they form a robust observability stack, enabling teams to deliver trusted, production-grade data pipelines across batch and streaming workflows.
- Use Delta Live Tables to automate pipelines, enforce rules, and track lineage natively in Databricks.
- Use Great Expectations for in-depth validation, anomaly detection, and schema profiling.
I’m not a fan of taking both sides of an argument. Let’s look at our core responsibilities as data engineers from the ground up and follow the solutions where the requirements take us.
Data Asset
A data asset is a managed, valuable dataset. A valuable dataset is not just data—it is data with purpose, invested in through processes and controls, and justified by the business value provided. A managed dataset is actively governed, monitored, and maintained to ensure it delivers sustained value to stakeholders.
Data Asset Management
Fundamentally, a data asset is considered managed when under governance. Proper data governance, such as Unity Catalog-managed datasets, has at least these fundamental characteristics.
Ownership & Stewardship | Who is responsible for maintaining the data asset? Who can answer questions about it? |
Access Control | Who can read, write, or modify this data? Are permissions aligned with roles and rules? |
Lineage | Where does this data come from? What transformations has it gone through? |
Compliance & Privacy | Is sensitive data (e.g., PII, PHI) addressed? Are retention and masking policies enforced? |
Auditability | Can we trace who accessed or modified the data and when? |
Unity Catalog is the foundation for a well-managed lakehouse. I have written about migrating to Unity Catalog and highlighted some bonus features. If you have not migrated yet, I recommend getting started immediately and then focusing on data value.
Data Asset Valuation
The business will communicate the value of a data asset primarily through a Service Level Agreement (SLA). The SLA defines agreed-upon expectations around reliability, performance, and quality.
Reliability describes the resiliency, freshness, and correctness of a data asset.
- Freshness (Liveness) – How up-to-date the data is.
- Accuracy (Correctness) – How well the data aligns with expected values or business rules.
- Availability (Resiliency) – How robust the data pipeline is to failures and recovery.
Performance describes the efficiency of data from ingestion to processing to consumption.
- Latency – Time taken for data to travel from source to consumption (e.g., ingestion-to-dashboard delay).
- Throughput – Volume of data processed over time (e.g., rows/sec, MB/min).
- Responsiveness – How quickly queries and pipelines respond under load or concurrency.
Quality describes the degree to which data meets defined rules, expectations, and standards.
- Completeness – All required data is present (e.g., no missing rows or fields).
- Validity – Data conforms to defined formats, ranges, or types.
- Consistency – Data is uniform across systems and time (e.g., no contradictory values).
- Uniqueness – No unintended duplicates exist.
- Accuracy – Same definition as in reliability; it’s important enough to be listed twice!
These business expectations are fulfilled by IT through People, Processes, and Technology.
For a price.
Servicing Data Assets
Service Level Objectives (SLOs) represent operations domains within the SLAs and can be progressively met. This concept will help to align cost to value within your budget. The dials being tuned here are the Software Development Lifecycle (SDLC) and the Databricks Medallion Architecture. SLAs define commitments for reliability, performance, and quality, and SLOs enforce those commitments throughout the data lifecycle. Each layer strengthens one or more of these domains in the Medallion architecture. Across the SDLC, IT teams progressively validate and enforce these guarantees to ensure production-grade data assets.
The workspace is the primary environment for working with data assets in Unity Catalog. Value is typically proportional within the layers from left to right.
SLA Domain | Dev | Test | Prod |
---|---|---|---|
Reliability | Monitor source connectivity and pipeline triggers | Validate pipeline scheduling, retries | SLAs ensure on-time delivery for consumers |
Performance | Baseline performance benchmarks | Load testing, profiling | Optimize for SLAs: query latency, data delivery speed |
Quality | Create GE/DQX test suites | Enforce checks with alerts | Blocking rules and alerting on quality failures |
- In Dev, you prototype and measure against reliability and performance goals.
- In Test, you simulate production load and validate SLA thresholds.
- In Prod, you enforce SLAs and alert on violations with automated monitoring and remediation (GE, DQX, Airflow, Unity Catalog audits, etc.).
The catalog is the primary unit of data isolation in the Databricks data governance model. Value is typically proportional within the layers from right to left.
SLA Domain | Bronze (Raw) | Silver (Cleaned) | Gold (Curated) |
---|---|---|---|
Reliability | Data lands on time; raw source integrity is monitored | DLT jobs run consistently; schema evolution is managed | Timely delivery of business-critical data |
Performance | Ingest processes optimized for load handling | Transformations are performant; no bottlenecks | Dashboards and queries load quickly |
Quality | Basic data profiling and source rule checks | DQ rules (e.g., null checks, constraints) enforced | Golden datasets meet business expectations for data quality |
- In Bronze, you focus on reliability and baseline quality.
- In Silver, you begin to emphasize quality and start optimizing performance.
- In Gold, you implement high reliability, optimized performance, and strong quality
Data becomes a true asset as it progresses through these layers, accruing value while incurring costs to meet increasing SLA expectations.
Delta Live Tables and Great Expectations
We are back to where we started, with a little more context. Great Expectations (GX) is focused on data validation and profiling, while Delta Live Tables (DLT) handles schema enforcement and transformations. While DLTs may not have sophisticated rule and profiling capabilities, their native integration to Unity Catalog allows for Its performance characteristics to be similar across both batch and streaming, while GX can struggle with streaming from a performance perspective.
The exercise of defining the progression of value across the SDLC and Medallion Architecture now pays dividends. DLTs stand out for end-to-end data management with automatic lineage management and schema evolution. Great Expectations can then be run as a separate process for more advanced data quality checks and profiling. This could be incorporated as part of a more advanced CI/CD process or just managed manually.
The key is not to focus on a tool in isolation with the idea of picking a winner. I believe most developers could become cross-trained on both technologies. Neither should be outside the scope of a junior data engineer. People are not a problem. I wish that DLTs were integrated with Great Expectations so I didn’t need to have two technologies, but a little Process goes a long way to resolve that Technology issue.
Conclusion
Integrating Delta Live Tables and Great Expectations within the Software Development Lifecycle (SDLC) and the Medallion Architecture helps teams reduce operational costs while continuously delivering business value.
- Early Validation Reduces Rework: Embedding GX expectations in development and staging environments enables early detection of schema and data issues, minimizing costly reprocessing and production downtime.
- DLTs Automate Operational Efficiency: With declarative pipelines and built-in monitoring, DLTs reduce manual orchestration and troubleshooting, saving engineering hours and compute costs.
- Incremental Value Delivery: By combining GX’s detailed validation in Bronze and Silver layers with DLT’s managed lineage and enforcement, teams can release high-quality data incrementally—delivering trusted datasets to stakeholders faster.
- FinOps-Aligned Observability: Monitoring volume, freshness, and anomalies with GX and DLT enables better cost attribution and prioritization, allowing data teams to optimize for quality and budget.
This hybrid approach supports robust data engineering practices and empowers organizations to scale with confidence, optimize their cloud spend, and maximize the return on data investments.
Contact us to learn more about how to empower your teams with the right tools, processes, and training to unlock Databricks’ full potential cost-consciously.