Skip to main content

David CallaghanSenior Solutions Architect

Databricks Champion | Center of Excellence Lead | Data Privacy & Governance Expert | Speaker & Trainer | 30+ Yrs in Enterprise Data Architecture

Connect with David

Blogs from this Author

Tick Symbol On A Digital Lcd Display With Reflection.

Test Driven Development with Databricks

I don’t like testing Databricks notebooks and that’s a problem. I like Databricks. I like Test Driven Development. Not in an evangelical; 100% code coverage or fail kind of way. I just find that a reasonable amount of code coverage gives me a reasonable amount of confidence. Databricks has documentation for unit testing. I tried […]

LinkedIn OpenHouse Control Plane

LinkedIn open sources a control plane for lake houses

LinkedIn open sources a lot of code. Kafka, of course, but also Samza and Voldemoort and a bunch of Hadoop tools like DataFu and Gobblin. Open-source projects tend to be created by developers to solve engineering problems while commercial products … Anyway, LinkedIn has a new open-source data offering called OpenHouse, which is billed as […]

Data Lakehouse House 2

Databricks Lakehouse Federation Public Preview

Sometimes, its nice to be able to skip a step. Most data projects involve data movement before data access. Usually this is not an issue; everyone agrees that the data must be made available before it can be available. There are use cases where the data movement part is a blocker because of time, cost, […]

Istock 960790462 (1)

Data Lake Governance with Tagging in Databricks Unity Catalog

The goal of Databricks Unity Catalog is to provide centralized security and management to data and AI assets across the data lakehouse. Unity Catalog provides fine-grained access control for all the securable objects in the lakehouse; databases, tables, files and even models. Gone are the limitations of the Hive metadata store. The Unity Catalog metastore […]

Feature Engineering with Databricks and Unity Catalog

Feature Engineering is the preprocessing step used to make raw data usable as input to an ML model through transformation, aggregation, enrichment, joining, normalization and other processes. Sometimes feature engineering is used against the output of another model rather than the raw data (transfer learning). At a high level, feature engineering has a lot in […]

Gears of business

Simulating Synchronous Operations with Asynchronous Code in Distributed Systems

Ensuring real-time status updates for end users in web applications can be challenging, particularly when working with Databricks, which lacks native support for synchronous updates. This means that changes made in Databricks may not be immediately reflected to end users, impacting the real-time nature of status updates. In this technical blog post, we will explore […]

Business Children Looking For Profits Through Binoculars

Elastic Cloud Enterprise for Regulated Corporate Search

Regulated industries, such as financial and healthcare companies, often need to make hard choices when it comes to balancing innovation and compliance. Most technology companies are focused on cloud-first, if not entirely cloud-native, offerings, particularly in the search and data space. I was recently working with a large financial services company that wanted to consolidate […]

gears

Integrating SAP Datasphere and Databricks Lakehouse for Unified Analytics

Integrating SAP and Databricks has typically required a lot of glue. Set up the SAP Data Hub environment, connect to the SAP data, set up a pipeline with Pipeline Modeler, configure the Streaming Analytics Service, setup Kafka or MQTT and receive the streaming data in Databricks with Spark Streaming. Most of these intermediate steps required […]

High Speed lights Tunnel motion trails

Real-time Data Processing: Databricks vs Flink

Real-time data processing is a critical need for modern-day businesses. It involves processing data as soon as it is generated to derive insights and take immediate actions. Databricks Streaming and Apache Flink are two popular stream processing frameworks that enable developers to build real-time data pipelines, applications and services at scale. In this article, we […]

Accelerate and Scale your Event Driven Architecture with GridGain

Are you looking for a way to accelerate and scale your Event Driven Architecture in the cloud? GridGain is here to help. GridGain, built on top of Apache Ignite, is a comprehensive in-memory computing platform that provides distributed caching, messaging, and compute capabilities, with enterprise-grade support. With its performance capabilities, it can increase the overall […]

Harden Databricks with Immuta’s Policy-As-Code Framework

Databricks Databricks provides a powerful, spark-centric, cloud-based analytics platform that enables users to rapidly process, transform and explore data. However, its preconfigured security can be insufficient in regulating or monitoring confidential information due to the flexibility it offers. This can be of particular concern to highly regulated enterprise, such a financial and health-care companies. Policy-as-code […]

Istock 1216188967

Next-Generation Data Cleanrooms with Delta Sharing

Data-driven companies are finding more and more use cases where their internal data could be supplemented with external datasets to deliver more business value. At the same time, there are legitimate data privacy concerns that need to be addressed, particularly among regulated enterprises in the financial and healthcare sector. There are opportunities here for a […]

Load More