I have written about the importance of migrating to Unity Catalog as an essential component of your Data Management Platform. Any migration exercise implies movement from a current to a future state. A migration from the Hive Metastore to Unity Catalog will require planning around workspaces, catalogs and user access. This is also an opportunity […]
Blogs from this Author
Unity Catalog, the Well-Architected Lakehouse and Performance Efficiency
I have written about the importance of migrating to Unity Catalog as an essential component of your Data Management Platform. Any migration exercise implies movement from a current to a future state. A migration from the Hive Metastore to Unity Catalog will require planning around workspaces, catalogs and user access. This is also an opportunity […]
Unity Catalog, the Well-Architected Lakehouse and Cost Optimization
I have written about the importance of migrating to Unity Catalog as an essential component of your Data Management Platform. Any migration exercise implies movement from a current to a future state. A migration from the Hive Metastore to Unity Catalog will require planning around workspaces, catalogs and user access. This is also an opportunity […]
Unity Catalog and the Well-Architected Lakehouse in Databricks
I have written about the importance of migrating to Unity Catalog as an essential component of your Data Management Platform. While Unity Catalog is a foundational component, it should be part of a broader strategic initiative to realign some of your current practices that may be less than optimal with newer, better practices. One comprehensive […]
Maximize Your Data Management with Unity Catalog
Databricks Unity Catalog is a unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform. Unity Catalog offers a comprehensive solution for enhancing data governance, operational efficiency, and technological performance. By centralizing metadata management, access controls, and data lineage tracking, it simplifies compliance, reduces complexity, and improves query performance […]
The Technical Power of Unity Catalog – Beyond Governance
If you use Databricks, you probably know that Databricks Unity Catalog is the industry’s only unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform. With Unity Catalog, organizations can seamlessly govern both structured and unstructured data in any format, as well as machine learning models, notebooks, dashboards and […]
Reducing Technical Debt with Databricks System Tables
Databricks system tables are currently in Public Preview, which means they are accessible but some detail may still change. This is how Databricks describes system tables: System tables are a Databricks-hosted analytical store of your account’s operational data found in the system catalog. System tables can be used for historical observability across your account. I’m going to […]
Databricks strengthens MosaicAI with Lilac
Databricks has acquired LilacAI as it continues to strengthen its end-to-end data intelligence platform. The 2023 acquisition of MosaicML gave Databricks significant capabilities in the in the Generative AI space with the ability to train and deploy Large Language Models (LLMs) at scale. Next, Databricks purchased Arcion to provide native real-time data ingestion into their […]
GCP Container Registry to Artifact Registry Migration
I got an email from Google Cloud Platform today entitled: [Action Required] Upgrade to Artifact Registry before March 18, 2025 This is not the first time Google has discontinued a product I use. They gave full year of lead time but I knew I would forget about it before then. I decided to look into […]
Using Snowflake and Databricks Together
This is not another comparison between Databricks and Snowflake; they’re not hard to find. This is a practical guide about using Databricks and Snowflake together in your organization. Many companies have both products implemented. Sometimes, there is a discrepancy between the two as far as the data being stored, creating new data silos. The Databricks […]
Writing Testable Python Objects in Databricks
I’ve been writing about Test-Driven Development in Databricks and some of the interesting issues that you can run into with Python objects. It’s always been my opinion that code that is not testable is detestable. Admittedly, its been very difficult getting to where I wanted to be with Databricks and TDD. Unfortunately, it’s hard to […]
Understanding the role of Py4J in Databricks
I mentioned that my attempt to implement TDD with Databricks was not totally successful. Setting up the local environment was not a problem and getting a service id for CI/CD component was more of an administrative than a technical problem. Using mocks to test python objects that are serialized to Spark is actually the issue. […]