Lakebase is Databricks‘ OLTP database and the latest member of its ML/AI offering. Databricks has incorporated various components to support its AI platform, including data components. The Feature Store has been available for some time as a governed, centralized repository that manages machine learning features throughout their lifecycle. Mosaic AI Vector Search is a vector […]
Posts Tagged ‘data integration’
Lakeflow: Revolutionizing SCD2 Pipelines with Change Data Capture (CDC)
Several breakthrough announcements emerged at DAIS 2025, but the Lakeflow updates around building robust pipelines had the most immediate impact on my current code. Specifically, I can now see a clear path to persisting SCD2 (Slowly Changing Dimension Type 2) tables in the silver layer from mutable data sources. If this sentence resonates with you, […]
Mastering Databricks Jobs API: Build and Orchestrate Complex Data Pipelines
In this post, we’ll dive into orchestrating data pipelines with the Databricks Jobs API, empowering you to automate, monitor, and scale workflows seamlessly within the Databricks platform. Why Orchestrate with Databricks Jobs API? When data pipelines become complex involving multiple steps—like running notebooks, updating Delta tables, or training machine learning models—you need a reliable way […]
Convert a Text File from UTF-8 Encoding to ANSI using Python in AWS Glue
To convert a text file from UTF-8 encoded data to ANSI using AWS Glue, you will typically work with Python or PySpark. However, it’s important to understand that ANSI is not a specific encoding but often refers to Windows-1252 (or similar 8-bit encodings) in a Windows context. AWS Glue, running on Apache Spark, uses UTF-8 […]
Revamping Data Integration for CRA Compliance: A Necessity in the New Normal
The Community Reinvestment Act (CRA) is a federal law in the US that promotes the interest of financial service firms to serve their communities’ credit needs, including low- and moderate-income neighborhoods. Federal banking agencies use the bank’s contribution metrics as a parameter when they apply for mergers, acquisitions, and new branch openings. CRA remains essential […]
Let’s Meet at Informatica World 2023 #InformaticaWorld
Informatica World takes place May 8-11 at the Venetian Resort Las Vegas and we can’t wait to meet you there! Perficient is a proud sponsor of Informatica’s largest event, which brings together customers and partners from across the globe. Perficient is a global digital consultancy, an Informatica Platinum Enterprise Partner, and the 2022 Cloud Modernization […]
SQL Tuning
In D & A Projects, building efficient SQL Queries is critical to achieving the Extraction and Load Batch cycles to complete faster and to meet the desired SLAs. The below observations are towards following the approaches to ensure writing SQL queries that meet the Best Practices to facilitate performance improvements. Tuning Approach Pre-Requisite Checks Before […]
Performance Tuning Guidelines – Informatica PowerCenter
Quite often, while building the Data Integration Pipeline, Performance is a critical factor. The factors below are vital for following the guidelines while working on ETL processing with Informatica PowerCenter. The following items are to be considered during ETL DEV: Pre-Requisite Checks and Analysis Basic Tuning Guidelines Additional Tuning Practices Tuning Approach Pre-Requisite Checks/Analysis Before […]
Perficient Named Talend US Partner of the Year
We’re pleased to announce that we’ve been presented the Talend US Partner of the Year award. Talend, a global leader in cloud data integration and data integrity, revealed its Partner of the Year Awards, 2021, during the annual Partner Summit held virtually this year. The award recognizes Perficient’s “demonstrated exceptional innovation and leadership in advancing […]
Configure Flat File Data Integrations in OneStream
On my most recent client engagement, I was tasked with importing data into OneStream. Data integrations are a strength of OneStream, and there are several methods to load data into the system. The four types of data integrations are: delimited file, fixed file, data management, or data connector. My client required flat file and direct […]
Implementation of Twitter Connector in Informatica Cloud
Introduction There was a time in a not so distant past, shopping means, it was all about planning which weekend to go, and in the name of shopping, people spend at least 3 to 4 hours minimum which includes going to restaurants as well. But now, Amazon and Flipkart are what we think about shopping, […]
Data Architecture: 2.5 Types of Modern Data Integration Tools
As we move into the modern cloud data architecture era, enterprises are deploying 2 primary classes of data integration tools to handle the traditional ETL and ELT use cases. The first type of Data integration tool is GUI-Based Data Integration solutions. Talend, Infosphere Datastage, Informatica, and Matillion are good examples. These tools leverage a UI […]