Skip to main content

Posts Tagged ‘PySpark’

He Puts The Pro In Programmer

Base Is Loaded: Bridging OLTP and OLAP with Lakebase and PySpark

For years, the Lakehouse paradigm has successfully collapsed the wall between Data Warehouses and Data Lakes. We have unified streaming and batch, structured and unstructured data, all under one roof. Yet we often find ourselves hitting a familiar, frustrating wall: the gap between the analytical plane (OLAP) and the transactional plane (OLTP). In my latest […]

Stethoscope With Clipboard And Laptop On Desk Doctor Working In Hospital Writing A Prescription Healthcare And Medical Concept Test Results In Background Vintage Color Selective Focus.

Writing Testable Python Objects in Databricks

I’ve been writing about Test-Driven Development in Databricks and some of the interesting issues that you can run into with Python objects. It’s always been my opinion that code that is not testable is detestable. Admittedly, its been very difficult getting to where I wanted to be with Databricks and TDD. Unfortunately, it’s hard to […]

Understanding the role of Py4J in Databricks

I mentioned that my attempt to implement TDD with Databricks was not totally successful. Setting up the local environment was not a problem and getting a service id for CI/CD component was more of an administrative than a technical problem. Using mocks to test python objects that are serialized to Spark is actually the issue. […]