DataAnalytics Articles / Blogs / Perficient

Databricks Lakebase – Database Branching in Action

Saravanan Ponnaiah — Fri, 04 Jul 2025 07:17:16 +0000

What is Databricks Lakebase?

Databricks Lakebase is a Postgres OLTP engine, integrated into Databricks Data Intelligence Platform. A database instance is a compute type that provides fully managed storage and compute resources for a postgres database. Lakebase leverages an architecture that separates compute and storage, which allows independent scaling while supporting low latency (<10ms) and high concurrency transactions.

Databricks has integrated this powerful postgres engine along with sophisticated capabilities that are benefited by Databricks recent acquisition of Neon. Lakebase is fully managed by Databricks, which means no infrastructure has to be provisioned and maintained separately. In addition to traditional OLTP engine, Lakebase comes with below features,

Openness: Lakebase are built on open-source standards
Storage and compute separation: Lakebase stores data in data lakes in open format. It enables scaling storage and compute independently.
Serverless: Lakebase is lightweight, meaning it can scale instantly up and down based on the load. It can scale down to zero, at which the cost of the lakebase is just for the storage of data only. No compute cost will be applied.
Modern development workflow: Branching a database is as simple as branching a code repository. It is done near instantly.
Built for AI Agents: Lakebases are designed to support a large number of AI agents. It’s branching and checkpointing capabilities enable AI agents to experiment and rewind to any point in time.
Lakehouse Integration: Lakebase make it easy to combine operational, analytical and AI systems without complex ETL pipelines.

In this article, we shall discuss in detail about how database branching feature works in Lakebase.

Database Branching

Database branching is one of the unique features introduced in Lakebase, that enables to branch out a database. It resembles the exact behavior of how code branch could be branched out from an existing branch.

Branching database is beneficial for an isolated test environment or point in time recovery. Lakebase uses copy-on-write branching mechanism to create an instant zero-copy clone of the database, with dedicated compute to operate on that branch. With zero-copy clone, it enables to create a branch of parent database of any size instantly.

The child branch is managed independently of the parent branch. With child isolated database branch, one can perform testing/debugging in the production copy of data. Though both parent and child databases appear separate, physically both instances would be pointing to same data pages. Under the hood, child database will be pointing to the actual data pages which parent is pointing to. When a change occurs in any of the data in child branch, then a new data page will be created with the new changes, and it will be available only to the branch. Any changes done in branch will not reflect in parent branch.

How branching works

The below diagrams represent how database branching works under the hood,

Lakebase in action

Here is the demonstration of how Lakebase instance can be created, branch out an instance and how table changes behave,

To create Lakebase instance, login Databricks and navigate to Compute -> OLTP Database tab -> Click “Create New Instance” button,

Click “New Query” to launch SQL Editor for PostgreSQL Database. In current instance, let’s create a new table and add some records.

Let’s create a database branch “pginstance2” from instance “pginstance1”. Goto Compute –> OLTP Database –> Create Database instance

Enter new instance name and expand “Advanced Settings” -> Enable “Create from parent” option -> Enter the source instance name “pginstance1”.

Under “Include data from parent up to”, select “Current point in time” option. Here, we can choose any specific point in time instance too.

Launch SQL Editor from pginstance2 database instance and query tbl_user_profile table

Now, let’s insert new record and update an existing record in the tbl_user_profile table in pginstance2,

Now, let’s switch back to parent database instance pginsntance1 and query tbl_user_profile table. The table in pginsntance1 should still be only 3 records. All the changes done in tbl_user_profile table should be available only in pginstance2.

Conclusion

Database changes that are done in one branch will not impact/reflect in another branch, thereby provide clear isolation of database at scale. Currently Lakebase do not have a feature to merge database branch. However, Databricks is committed and working towards database merge capability in near future.

7 Steps to Define a Data Governance Structure for a Mid-Sized Bank (Without Losing Your Mind)

Amit Sonavane — Tue, 25 Mar 2025 22:07:39 +0000

A mid-sized bank I was consulting with for their data warehouse modernization project finally realized that data isn’t just some necessary but boring stuff the IT department hoards in their digital cave. It’s the new gold, the ticking time bomb of risk, and the bane of every regulatory report that’s ever come back with more red flags than a beach during a shark sighting.

Welcome to the wild world of data governance, where dreams of order collide with the chaos of reality. Before you start mainlining espresso and squeezing that stress ball shaped suspiciously like your last audit report, let’s break this down into 7 steps that might just keep you sane.

Wrangle Some Executive Buy-In

Let’s not pretend. Without exec sponsorship, your data governance initiative is just a Trello board with high hopes. You need someone in a suit (preferably with a C in their title) to not just bless but be convinced about your mission, and preferably get it added to their KPI this year.

Pro tip to get that signature: Skip the jargon about “metadata catalogs” and go straight for the jugular with words like “penalties” and “reputational risk.” Nothing gets an exec’s attention quite like the threat of their club memberships being revoked.

Tame the Scope Before It Turns Into a Stampede

Organizations have a knack for letting projects balloon faster than a tech startup’s valuation. Be ruthless. You don’t need to govern every scrap of data from the CEO’s coffee order to the janitor’s mop schedule.

Focus on the critical stuff:

Customer data (because knowing who owes you money is kind of important)
Transaction history (aka “where did all the money go?”)
Regulatory reporting (because nobody likes surprise visits from auditors)

Start small, prove it works, then expand. Rome wasn’t built in a day, and neither was a decent data governance structure.

Pick a Framework (But Don’t Treat It Like Holy Scripture)

Sure, you could go full nerd and dive into DAMA-DMBOK, but unless you’re gunning for a PhD in bureaucracy, keep it simple. Aim for a model that’s more “I get it” and less “I need an interpreter”.

Focus on:

Who’s responsible for what (RACI, if you must use an acronym)
What data belongs where
Rules that sound smart but won’t make everyone quit in protest

Remember, frameworks are like diets – the best one is the one you’ll actually stick to.

Recruit Your Data Stewards (and Convince Them It’s Not a Punishment)

Your data stewards are the poor souls standing between order and chaos, armed with nothing but spreadsheets and a dwindling supply of patience. Look for folks who:

Actually understand the data (a rare breed, cherish them)
Can handle details without going cross-eyed
Won’t melt down when stuck between the rock of compliance and the hard place of IT

Bonus: Give them a fancy title like “Data Integrity Czar.” It won’t pay more, but it might make them feel better about their life choices.

Define Your Terms (Or Prepare for the “What Even Is a ‘Customer’?” Wars)

Get ready for some fun conversations about what words mean. You’d think “customer” would be straightforward, but you’d be wrong. So very, very wrong.

Establish a single source of truth
Create a glossary that doesn’t read like a legal document
Accept that these definitions will change more often than a teenager’s social media profile

It’s not perfect, but it’s governance, not a philosophical treatise on the nature of reality.

Build Your Tech Stack (But Don’t Start with the Shiny Toys)

For the love of all that is holy and GDPR-compliant, don’t buy a fancy governance tool before you know what you’re doing. Your tech should support your process, not be a $250,000 band-aid for a broken system.

Figure out:

Who gets to see what (and who definitely shouldn’t)
How you’re classifying data (beyond “important” and “meh”)
Where your golden records live
What to do when it all inevitably goes sideways

Metadata management and data lineage tracking are great, but they’re the icing, not the cake.

Make It Boring (In a Good Way)

The true test of your governance structure isn’t the PowerPoint that put the board to sleep. It’s whether it holds up when someone decides to get creative with data entry at 4:59 PM on Fridays.

So:

Schedule regular data quality check-ups
Treat data issues like actual problems, not minor inconveniences
Set up alerts (but not so many that everyone ignores them)
Reward the good, don’t just punish the bad

Bonus: Document Everything (Then Document Your Documentation)

If it’s not written down, it doesn’t exist. If it’s written down but buried in a SharePoint site that time forgot, it still doesn’t exist.

Think of governance like flossing – it’s not exciting, but it beats the alternative.

Several mid-sized banks have successfully implemented data governance structures, demonstrating the real-world benefits of these strategies. Here are a few notable examples:

Case Study of a Large American Bank

This bank’s approach to data governance offers valuable lessons for mid-sized banks. The bank implemented robust data governance practices to enhance data quality, security, and compliance. Their focus on:

Aligning data management with regulatory requirements
Ensuring accurate financial reporting
Improving decision-making processes

resulted in better risk management, increased regulatory compliance, and enhanced customer trust through secure and reliable financial services.

Regional Bank Case Study

A regional bank successfully tackled data quality issues impacting compliance, credit, and liquidity risk assessment. Their approach included:

Establishing roles and responsibilities for data governance
Creating domains with assigned data custodians and stewards
Collecting and simplifying knowledge about critical data elements (CDEs)

For example, in liquidity risk assessment, they identified core CDEs such as liquidity coverage ratio and net stable funding ratio.

Mid-Sized Bank Acquisition

In another case, a major bank acquired a regional financial services company and faced the challenge of integrating disparate data systems. Their data governance implementation involved:

Launching a data consolidation initiative
Centralizing data from multiple systems into a unified data warehouse
Establishing a cross-functional data governance team
Defining clear data definitions, ownership rules, and access permissions

This approach eliminated data silos, created a single source of truth, and significantly improved data quality and reliability. It also facilitated more accurate reporting and analysis, leading to more effective risk management and smoother banking services for customers.

Parting Thought

In the end, defining a data governance structure for your bank isn’t about creating a bureaucratic nightmare. It’s about keeping your data in check, your regulators off your back, and your systems speaking the same language.

When it all comes together, and your data actually starts making sense, you’ll feel like a criminal mastermind watching their perfect plan unfold. Only, you know, legal and with fewer car chases.

Now go forth and govern. May your data be clean, your audits be boring, and your governance meetings be mercifully short.

Data Governance in Banking and Financial Services – Importance, Tools and the Future

Amit Sonavane — Tue, 15 Oct 2024 22:21:33 +0000

Let’s talk about data governance in banking and financial services, one area I have loved working in and in various areas of it … where data isn’t just data, numbers aren’t just numbers … They’re sacred artifacts that need to be protected, documented, and, of course, regulated within an inch of their lives. It’s not exactly the most glamorous part of financial services, but without solid data governance, banks would be floating in a sea of disorganized, chaotic, and potentially disastrous data mismanagement. And when we’re talking about billions of dollars in transactions, we’re not playing around.

As Bob Seiner, a renowned data governance expert, puts it, “Data governance is like oxygen. You don’t notice it until it’s missing, and by then, it’s probably too late.” If that doesn’t send a chill down your spine, nothing will.

Why is Data Governance Such a Big Deal?

In the banking sector, data governance is more than just a compliance checkbox. It’s essential for survival. Banks process an astronomical amount of sensitive information daily—think trillions of transactions annually—and they need to manage that data efficiently and securely. According to the World Bank, the global financial industry processes over $5 trillion in transactions every day. That’s not the kind of volume you want slipping through the cracks.

Even a small data breach can cost banks upwards of $4.35 million on average, according to a 2022 IBM report. No one wants to be the bank that has to call its shareholders after that kind of financial disaster.

Data governance helps mitigate these risks by ensuring data is accurate, consistent, and compliant with regulations like GDPR, CCPA, and Basel III. These rules are about as fun as reading tax code, but they’re crucial in ensuring customer data is protected, privacy is maintained, and banks don’t end up with regulators breathing down their necks.

Tools of the Data Governance Trade

Let’s talk about the cavalry—the tools that keep all this data governance stuff from turning into a full-blown nightmare. Thankfully, in 2024, we’re spoiled with a variety of platforms designed specifically to handle this madness.

Collibra and Informatica
- Collibra and Informatica are heavyweights in the data governance world, offering comprehensive suites for data cataloging, stewardship, and governance. Financial services companies like AXA and ABN AMRO rely on these tools to handle everything from compliance workflows to data lineage mapping.
Alation and Talend
- Alation is known for its AI-powered data cataloging and governance capabilities, while Talend excels in data integration and governance. Companies like American Express have adopted Alation’s tools to streamline their data governance operations.

The Future of Data Governance in Banking

Looking forward, the financial sector’s reliance on robust data governance is only going to increase. With the rise of AI, machine learning, and real-time data analytics, banks will need to be even more diligent in how they manage and govern their data. A recent study from IDC suggests that by 2026, 70% of financial institutions will have formalized data governance frameworks in place. That’s up from around 50% today, meaning that the laggards are starting to realize that flying by the seat of their pants just won’t cut it anymore.

Jamie Dimon, CEO of JPMorgan Chase, emphasized the importance of data governance in a recent shareholder letter, stating, “Data is the lifeblood of our organization. Our ability to harness, protect, and leverage it effectively will determine our success in the coming decades.”

Climate risk models are the newest elephant in the room. As banks face pressure to account for environmental factors in their risk assessments, data governance plays a critical role in ensuring the accuracy and transparency of these models. According to S&P Global, nearly 60% of global banks will be embedding climate risk into their core business models by 2025.

In a world where data is king, and compliance is the watchful queen, banks are stuck playing by the rules whether they like it or not. Data governance tools are not just for keeping regulators happy, but they also give financial institutions the confidence to innovate, knowing that they’ve got their data house in order.

A recent survey by Deloitte found that 67% of banking executives believe that improving data governance is critical to their digital transformation efforts. This statistic underscores the growing recognition that effective data governance is not just about compliance, but also about enabling innovation and competitive advantage.

So, yeah… data governance might not be the flashiest part of banking, but it’s the foundation that holds everything together. And if there’s one thing we can agree on, it’s that nobody wants to be the bank that ends up on the evening news because they forgot to lock the vault—whether it’s the physical one or the digital one.