Skip to main content

Posts Tagged ‘Data Engineering’

Istock 1824158252

Convert a Text File from UTF-8 Encoding to ANSI using Python in AWS Glue

To convert a text file from UTF-8 encoded data to ANSI using AWS Glue, you will typically work with Python or PySpark. However, it’s important to understand that ANSI is not a specific encoding but often refers to Windows-1252 (or similar 8-bit encodings) in a Windows context. AWS Glue, running on Apache Spark, uses UTF-8 […]

Istock 2163867912

Top 5 Mistakes That Make Your Databricks Queries Slow (and How to Fix Them)

I wanted to discuss the top 5 mistakes that make your Databricks queries slow as a prequel to some of my FinOps blogs. Premature optimization may or may be the root of all evil, but we can all agree optimization without a solid foundation is not an effective use of time and resources. Predictive optimization […]

Istock 2160707342

Delta Live Tables and Great Expectations: Better Together

Modern data platforms like Databricks enable organizations to process massive volumes of batch and streaming data—but scaling reliably requires more than just compute power. It demands data observability: the ability to monitor, validate, and trace data through its lifecycle. This blog compares two powerful tools—Delta Live Tables and Great Expectations—that bring observability to life in […]

Istock 179133772

How Automatic Liquid Clustering Supports Databricks FinOps at Scale

Perficient has a FinOps mindset with Databricks, so the Automatic Liquid Clustering announcement grabbed my attention. I’ve mentioned Liquid Clustering before when discussing the advantages of Unity Catalog beyond governance use cases. Unity Catalog: come for the data governance, stay for the predictive optimization. I am usually a fan of being able to tune the dials […]

Website Design. Developing Programming And Coding Technologies.

Python Optimization: Improve Code Performance

🚀 Python Optimization: Improve Code Performance 🎯 Introduction Python is an incredibly powerful and easy-to-use programming language. However, it can be slow if not optimized properly! 😱 This guide will teach you how to turbocharge your code, making it faster, leaner, and more efficient. Buckle up, and let’s dive into some epic optimization hacks! 💡🔥 […]

Boxers In Action

Databricks on Azure versus AWS

As a Databricks Champion working for Perficient’s Data Solutions team, I spend most of my time installing and managing Databricks on Azure and AWS. The decision on which cloud provider to use is typically outside my scope since the organization has already made it. However, there are occasions when the client uses both hyperscalers or […]