Skip to main content

Databricks

Databricks Lakebase – Database Branching in Action

Developers Guide To Custom Datasource For Dropdown Fields In Context-Aware Configuration Editor

What is Databricks Lakebase?

Databricks Lakebase is a Postgres OLTP engine, integrated into Databricks Data Intelligence Platform. A database instance is a compute type that provides fully managed storage and compute resources for a postgres database. Lakebase leverages an architecture that separates compute and storage, which allows independent scaling while supporting low latency (<10ms) and high concurrency transactions.

Databricks has integrated this powerful postgres engine along with sophisticated capabilities that are benefited by Databricks recent acquisition of Neon. Lakebase is fully managed by Databricks, which means no infrastructure has to be provisioned and maintained separately. In addition to traditional OLTP engine, Lakebase comes with below features,

  • Openness: Lakebase are built on open-source standards
  • Storage and compute separation: Lakebase stores data in data lakes in open format. It enables scaling storage and compute independently.
  • Serverless: Lakebase is lightweight, meaning it can scale instantly up and down based on the load. It can scale down to zero, at which the cost of the lakebase is just for the storage  of data only. No compute cost will be applied.
  • Modern development workflow: Branching a database is as simple as branching a code repository. It is done near instantly.
  • Built for AI Agents: Lakebases are designed to support a large number of AI agents. It’s branching and checkpointing capabilities enable AI agents to experiment and rewind to any point in time.
  • Lakehouse Integration: Lakebase make it easy to combine operational, analytical and AI systems without complex ETL pipelines.

In this article, we shall discuss in detail about how database branching feature works in Lakebase.

Database Branching

Database branching is one of the unique features introduced in Lakebase, that enables to branch out a database. It resembles the exact behavior of how code branch could be branched out from an existing branch.

Branching database is beneficial for an isolated test environment or point in time recovery. Lakebase uses copy-on-write branching mechanism to create an instant zero-copy clone of the database, with dedicated compute to operate on that branch. With zero-copy clone, it enables to create a branch of parent database of any size instantly.

The child branch is managed independently of the parent branch. With child isolated database branch, one can perform testing/debugging in the production copy of data. Though both parent and child databases appear separate, physically both instances would be pointing to same data pages. Under the hood, child database will be pointing to the actual data pages which parent is pointing to. When a change occurs in any of the data in child branch, then a new data page will be created with the new changes, and it will be available only to the branch. Any changes done in branch will not reflect in parent branch.

How branching works

The below diagrams represent how database branching works under the hood,

Database Branching

Database Branching Updates

Lakebase in action

Here is the demonstration of how Lakebase instance can be created, branch out an instance and how table changes behave,

To create Lakebase instance, login Databricks and navigate to Compute -> OLTP Database tab -> Click “Create New Instance” button,

Create New Instance 01

Create New Instance Success 02

Click “New Query” to launch SQL Editor for PostgreSQL Database. In current instance, let’s create a new table and add some records.

Instance1 Create Table 03

Instance1 Query Table 04

Let’s create a database branch “pginstance2” from instance “pginstance1”. Goto Compute –> OLTP Database –> Create Database instance

Enter new instance name and expand “Advanced Settings” -> Enable “Create from parent” option -> Enter the source instance name “pginstance1”.

Under “Include data from parent up to”, select “Current point in time” option. Here, we can choose any specific point in time instance too.

Create Instance2 05

Instance2 Create Success 06

Launch SQL Editor from pginstance2 database instance and query tbl_user_profile table

Instance2 Query Table 07

Now, let’s insert new record and update an existing record in the tbl_user_profile table in pginstance2,

Instance2 Update Table 08

Now, let’s switch back to parent database instance pginsntance1 and query tbl_user_profile table. The table in pginsntance1 should still be only 3 records. All the changes done in tbl_user_profile table should be available only in pginstance2.

Instance1 Query Table 09

Conclusion

Database changes that are done in one branch will not impact/reflect in another branch, thereby provide clear isolation of database at scale. Currently Lakebase do not have a feature to merge database branch. However, Databricks is committed and working towards database merge capability in near future.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Saravanan Ponnaiah

Saravanan Ponnaiah is a solutions architect working on cloud and data projects. He has 20 years of experience with deep expertise in cloud and data platforms, working in banking and financial, insurance, healthcare, and digital forensics domains. Has architected enterprise data solutions with expertise in Databricks and implemented scalable GenAI solutions.

More from this Author

Follow Us