David Callaghan, Author at Perficient Blogs

LinkedIn open sources a control plane for lake houses

LinkedIn open sources a lot of code. Kafka, of course, but also Samza and Voldemoort and a bunch of Hadoop tools like DataFu and Gobblin. Open-source projects tend to be created by developers to solve engineering problems while commercial products … Anyway, LinkedIn has a new open-source data offering called OpenHouse, which is billed as […]

Data + Intelligence

Databricks Lakehouse Federation Public Preview

Sometimes, its nice to be able to skip a step. Most data projects involve data movement before data access. Usually this is not an issue; everyone agrees that the data must be made available before it can be available. There are use cases where the data movement part is a blocker because of time, cost, […]

Data + Intelligence

Data Lake Governance with Tagging in Databricks Unity Catalog

The goal of Databricks Unity Catalog is to provide centralized security and management to data and AI assets across the data lakehouse. Unity Catalog provides fine-grained access control for all the securable objects in the lakehouse; databases, tables, files and even models. Gone are the limitations of the Hive metadata store. The Unity Catalog metastore […]

Data + Intelligence

Feature Engineering with Databricks and Unity Catalog

Feature Engineering is the preprocessing step used to make raw data usable as input to an ML model through transformation, aggregation, enrichment, joining, normalization and other processes. Sometimes feature engineering is used against the output of another model rather than the raw data (transfer learning). At a high level, feature engineering has a lot in […]

Data + Intelligence

Simulating Synchronous Operations with Asynchronous Code in Distributed Systems

Ensuring real-time status updates for end users in web applications can be challenging, particularly when working with Databricks, which lacks native support for synchronous updates. This means that changes made in Databricks may not be immediately reflected to end users, impacting the real-time nature of status updates. In this technical blog post, we will explore […]

Data + Intelligence

Business Children Looking For Profits Through Binoculars

Elastic Cloud Enterprise for Regulated Corporate Search

Regulated industries, such as financial and healthcare companies, often need to make hard choices when it comes to balancing innovation and compliance. Most technology companies are focused on cloud-first, if not entirely cloud-native, offerings, particularly in the search and data space. I was recently working with a large financial services company that wanted to consolidate […]

Data + Intelligence

Integrating SAP Datasphere and Databricks Lakehouse for Unified Analytics

Integrating SAP and Databricks has typically required a lot of glue. Set up the SAP Data Hub environment, connect to the SAP data, set up a pipeline with Pipeline Modeler, configure the Streaming Analytics Service, setup Kafka or MQTT and receive the streaming data in Databricks with Spark Streaming. Most of these intermediate steps required […]

Data + Intelligence

Real-time Data Processing: Databricks vs Flink

Real-time data processing is a critical need for modern-day businesses. It involves processing data as soon as it is generated to derive insights and take immediate actions. Databricks Streaming and Apache Flink are two popular stream processing frameworks that enable developers to build real-time data pipelines, applications and services at scale. In this article, we […]

Data + Intelligence Databricks

Accelerate and Scale your Event Driven Architecture with GridGain

Are you looking for a way to accelerate and scale your Event Driven Architecture in the cloud? GridGain is here to help. GridGain, built on top of Apache Ignite, is a comprehensive in-memory computing platform that provides distributed caching, messaging, and compute capabilities, with enterprise-grade support. With its performance capabilities, it can increase the overall […]

Data + Intelligence

Harden Databricks with Immuta’s Policy-As-Code Framework

Databricks Databricks provides a powerful, spark-centric, cloud-based analytics platform that enables users to rapidly process, transform and explore data. However, its preconfigured security can be insufficient in regulating or monitoring confidential information due to the flexibility it offers. This can be of particular concern to highly regulated enterprise, such a financial and health-care companies. Policy-as-code […]

Data + Intelligence Databricks

Next-Generation Data Cleanrooms with Delta Sharing

Data-driven companies are finding more and more use cases where their internal data could be supplemented with external datasets to deliver more business value. At the same time, there are legitimate data privacy concerns that need to be addressed, particularly among regulated enterprises in the financial and healthcare sector. There are opportunities here for a […]

Data + Intelligence

Delta Sharing for Modern Secure Data Sharing

Delta Lake is an open-source framework under the Linux Foundation used to build Lakehouse architectures. A new project is Delta Sharing, which is an open protocol for secure real-time exchange of large datasets. Databricks provides production-grade implementations of the projects under delta-io, including Databricks Delta Sharing. Understanding the open-source foundation of the enterprise offering can […]

Data + Intelligence Databricks

David Callaghan – Senior Solutions Architect

Connect with David

Blogs from this Author

LinkedIn open sources a control plane for lake houses

Databricks Lakehouse Federation Public Preview

Data Lake Governance with Tagging in Databricks Unity Catalog

Feature Engineering with Databricks and Unity Catalog

Simulating Synchronous Operations with Asynchronous Code in Distributed Systems

Elastic Cloud Enterprise for Regulated Corporate Search

Integrating SAP Datasphere and Databricks Lakehouse for Unified Analytics

Real-time Data Processing: Databricks vs Flink

Accelerate and Scale your Event Driven Architecture with GridGain

Harden Databricks with Immuta’s Policy-As-Code Framework

Next-Generation Data Cleanrooms with Delta Sharing

Delta Sharing for Modern Secure Data Sharing