Exploring Lakehouse//RT and Reyden: Can Databricks Handle FHIR Data at Scale? / Blogs / Perficient

by Nick Passero, Director AI Data & Analytics, Databricks Practice Lead and Balu Muthiah, Sr. Solutions Architect, AI Data Platforms

At DAIS 2026, Databricks announced Lakehouse//RT, powered by Reyden, a ground-up engine rewrite and not an update to Photon. It runs against existing Delta and Iceberg tables without restructuring, requires Unity Catalog, and keeps data in open formats.

Databricks frames the value proposition around three costs of maintaining a separate serving layer: data duplication into proprietary storage, governance policies that don’t travel with the data out of Unity Catalog, and the ongoing engineering burden of owning a second pipeline.

Official performance claims include up to 16x better performance versus real-time serving layers, with response times as low as 10 milliseconds on smaller datasets and sub-100 milliseconds on larger ones.

How We’re Using It

A healthcare client came to us with a straightforward problem: their FHIR data was already in Databricks, and they didn’t want another system to manage it. Standing up a dedicated FHIR server (HAPI, Azure FHIR API, Smile CDR) means another deployment, another copy of PHI, another governance boundary, another bill, and another platform someone has to debug. The question was whether the lakehouse they already had could do the job.

The system follows a three-layer pattern. Bronze handles raw ingestion and is never modified; it’s the source of truth, and its immutability is what makes compliance and audit defensible. Silver holds cleaned, validated data optimized for queries. The API layer, built on FastAPI, serves FHIR REST endpoints to consuming systems. Keeping these concerns separated means ingestion, processing, and serving don’t interfere with each other, and each layer scales independently. In a healthcare setting, that clean boundary between raw and transformed data is useful well beyond performance.

DBIgnite feeds Bronze directly, pulling from EHRs and APIs or consuming pre-staged landing zones depending on the source system.

The request flow: a FHIR client calls an endpoint → FastAPI validates and translates the FHIR parameters → the service layer generates optimized SQL → Databricks queries Bronze or Silver → results are mapped to the FHIR schema → a JSON response is returned. Each step is independently observable and testable.

What We Built

We explored storing raw FHIR resources in JSON, NDJSON, and Variant format directly in Delta tables. Unity Catalog handled governance. The problem was query performance: deeply nested FHIR JSON in a lakehouse doesn’t behave like a transactional FHIR server, and the gap on patient-specific lookups was real.

To put numbers to it, we built a claims query system for Explanation of Benefit (EOB) records: what a patient’s plan covered, what was billed, how the claim was adjudicated. Runs on Databricks App, ingests via DBIgnite into Delta tables under Unity Catalog. Baseline performance using Databricks SQL: sub-300 milliseconds for search queries, sub-150 milliseconds for direct record retrieval. Workable, but not fast enough to feel invisible.

What We Observed

Patient-specific EOB lookups across a few billion records, each query returning 50–100 records scattered across all data files with no clean partition shortcut.

On individual EOB lookups, DBSQL averaged approximately 550 milliseconds. Reyden on a small cluster averaged 86 milliseconds, roughly 84% lower latency. All seven of the seven individual EOB queries improved. Reyden’s ceiling (111 milliseconds) didn’t reach DBSQL’s floor (488 milliseconds). Scaling to a large cluster (4x the compute) produced no meaningful gain: 93 milliseconds versus 86 milliseconds, statistically indistinguishable. One collated EOB query pattern was slower on Reyden across all cluster sizes. One collated EOB query pattern was slower on Reyden across all cluster sizes. Query shape matters, and there are areas worth investigating for further optimization.

DBSQL vs. Reyden Execution Times

Executed Query	DBSQL (Baseline)	Reyden (Small)	Reyden (Medium)	Reyden (Large)
Collated EOB · multi-cluster · demographics #1	0.509s	0.146s	0.154s	0.138s
Collated EOB · multi-cluster · demographics #2	0.566s	0.144s	0.127s	0.136s
Collated EOB · multi-cluster · patient_id #1	0.543s	0.248s	0.213s	0.215s
Collated EOB · multi-cluster · patient_id #2	0.586s	0.995s	0.660s	0.798s
Collated EOB · patient-id-cluster #1	0.563s	0.164s	0.651s	0.133s
Collated EOB · patient-id-cluster #2	0.522s	0.127s	0.123s	0.122s
Collated EOB · patient-id-cluster #3	0.564s	0.118s	0.130s	0.145s
Indv EOB · multi-cluster · demographics #1	0.553s	0.077s	0.075s	0.081s
Indv EOB · multi-cluster · demographics #2	0.542s	0.111s	0.112s	0.109s
Indv EOB · multi-cluster · patient_id #1	0.547s	0.077s	0.078s	0.073s
Indv EOB · multi-cluster · patient_id #2	0.584s	0.103s	0.096s	0.110s
Indv EOB · patient-id-cluster #1	0.539s	0.075s	0.073s	0.075s
Indv EOB · patient-id-cluster #2	0.607s	0.094s	0.097s	0.111s
Indv EOB · patient-id-cluster #3	0.488s	0.067s	0.062s	0.093s

Where This Leaves Us

The case for consolidating FHIR services natively in Databricks is now a serious engineering conversation, not a wish list item.

Reyden delivered sub-100 millisecond patient-specific lookups across billions of records on a small cluster, without a separate serving layer, without a second copy of PHI, and without a second governance boundary to maintain. For healthcare teams already running on Databricks with Unity Catalog deployed, the path to retiring a third-party FHIR server or real-time serving layer is shorter than it has ever been. As Reyden moves toward general availability, we see a real opportunity to consolidate the full FHIR stack: ingestion, governance, and low-latency serving inside a single platform. We will be building on it.

How We’re Using It

What We Built

What We Observed

Where This Leaves Us

Related Posts

Building an AI Agent for Oracle HCM Journeys: What I Learned Along the Way

From Data to Trust: Powering Battery Passports through Databricks AI

5 Takeaways from DAIS 2026 You Need To Know