David Callaghan – Senior Solutions Architect

Databricks Champion | Center of Excellence Lead | Data Privacy & Governance Expert | Speaker & Trainer | 30+ Yrs in Enterprise Data Architecture

Connect with David

Blogs from this Author

Unity Catalog and the Well-Architected Lakehouse in Databricks

I have written about the importance of migrating to Unity Catalog as an essential component of your Data Management Platform. While Unity Catalog is a foundational component, it should be part of a broader strategic initiative to realign some of your current practices that may be less than optimal with newer, better practices. One comprehensive […]

Data + Intelligence

Maximize Your Data Management with Unity Catalog

Databricks Unity Catalog is a unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform. Unity Catalog offers a comprehensive solution for enhancing data governance, operational efficiency, and technological performance. By centralizing metadata management, access controls, and data lineage tracking, it simplifies compliance, reduces complexity, and improves query performance […]

Data + Intelligence

The Technical Power of Unity Catalog – Beyond Governance

If you use Databricks, you probably know that Databricks Unity Catalog is the industry’s only unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform. With Unity Catalog, organizations can seamlessly govern both structured and unstructured data in any format, as well as machine learning models, notebooks, dashboards and […]

Data + Intelligence

Little kid putting a coin in a piggy bank

Reducing Technical Debt with Databricks System Tables

Databricks system tables are currently in Public Preview, which means they are accessible but some detail may still change. This is how Databricks describes system tables: System tables are a Databricks-hosted analytical store of your account’s operational data found in the system catalog. System tables can be used for historical observability across your account. I’m going to […]

Data + Intelligence

Databricks strengthens MosaicAI with Lilac

Databricks has acquired LilacAI as it continues to strengthen its end-to-end data intelligence platform. The 2023 acquisition of MosaicML gave Databricks significant capabilities in the in the Generative AI space with the ability to train and deploy Large Language Models (LLMs) at scale. Next, Databricks purchased Arcion to provide native real-time data ingestion into their […]

Data + Intelligence

GCP is migrating from Container Registry to Artifact Registry

GCP Container Registry to Artifact Registry Migration

I got an email from Google Cloud Platform today entitled: [Action Required] Upgrade to Artifact Registry before March 18, 2025 This is not the first time Google has discontinued a product I use. They gave full year of lead time but I knew I would forget about it before then. I decided to look into […]

Cloud

Using Snowflake and Databricks Together

This is not another comparison between Databricks and Snowflake; they’re not hard to find. This is a practical guide about using Databricks and Snowflake together in your organization. Many companies have both products implemented. Sometimes, there is a discrepancy between the two as far as the data being stored, creating new data silos. The Databricks […]

Data + Intelligence

Stethoscope With Clipboard And Laptop On Desk Doctor Working In Hospital Writing A Prescription Healthcare And Medical Concept Test Results In Background Vintage Color Selective Focus.

Writing Testable Python Objects in Databricks

I’ve been writing about Test-Driven Development in Databricks and some of the interesting issues that you can run into with Python objects. It’s always been my opinion that code that is not testable is detestable. Admittedly, its been very difficult getting to where I wanted to be with Databricks and TDD. Unfortunately, it’s hard to […]

Data + Intelligence

Understanding the role of Py4J in Databricks

I mentioned that my attempt to implement TDD with Databricks was not totally successful. Setting up the local environment was not a problem and getting a service id for CI/CD component was more of an administrative than a technical problem. Using mocks to test python objects that are serialized to Spark is actually the issue. […]

Data + Intelligence

Tick Symbol On A Digital Lcd Display With Reflection.

Test Driven Development with Databricks

I don’t like testing Databricks notebooks and that’s a problem. I like Databricks. I like Test Driven Development. Not in an evangelical; 100% code coverage or fail kind of way. I just find that a reasonable amount of code coverage gives me a reasonable amount of confidence. Databricks has documentation for unit testing. I tried […]

Data + Intelligence

LinkedIn open sources a control plane for lake houses

LinkedIn open sources a lot of code. Kafka, of course, but also Samza and Voldemoort and a bunch of Hadoop tools like DataFu and Gobblin. Open-source projects tend to be created by developers to solve engineering problems while commercial products … Anyway, LinkedIn has a new open-source data offering called OpenHouse, which is billed as […]

Data + Intelligence

Databricks Lakehouse Federation Public Preview

Sometimes, its nice to be able to skip a step. Most data projects involve data movement before data access. Usually this is not an issue; everyone agrees that the data must be made available before it can be available. There are use cases where the data movement part is a blocker because of time, cost, […]

Data + Intelligence