Skip to main content

Software Development

Performance Tuning Guidelines – Informatica PowerCenter

cyber space, digital lines, data grid, rainbow lights

Quite often, while building the Data Integration Pipeline, Performance is a critical factor. The factors below are vital for following the guidelines while working on ETL processing with Informatica PowerCenter.

The following items are to be considered during ETL DEV:

  • Pre-Requisite Checks and Analysis
  • Basic Tuning Guidelines
  • Additional Tuning Practices

Tuning Approach

Pre-Requisite Checks/Analysis 

Before we get into subjecting an ETL Mapping against Performance Improvements, below are steps to be adopted :

  • Deep Dive into the Mapping to gather Basic Info.
    • Complexity of the Mapping (# of SRCs/TGTs, Transformations, Technical Logic)
    • Design of the Mapping (End-End Flow, Single/Multiple Pipelines)
    • Whether Best Practices followed
  • Verify the As-Is Metrics of the Mapping
    • Data Volume (SRC/TGT)
    • Duration of the Job Completion
    • Throughput
    • Busy Percentage of the Threads (Reader/Writer/Transformation)
    • Collect Performance Statistics
  • Ensure the ETL Server/System is not the reason for processing slowness
    • Are there frequent Network Connectivity issues?
    • Does the ETL System/Server has required H/W Capabilities?
    • Does the ETL Metadata DB have Enough Space?
    • Whether the System has Accumulated Log/Cache files blocking Server space?
    • DBs Slow with READ/WRITE?

After ensuring the above prerequisites are taken care of and bottlenecks identified, if the ETL DEV is recognized as the root cause for slowness, Tuning practices can be applied to the Mappings if we expect a significant improvement to meet the SLAs and other Business benefits.

Basic Tuning Guidelines

Basic Guidelines are listed below :

  • Design Perspective
    • Bring relevant and required fields on subsequent transformations
    • Perform Incremental Extracts to limit processing
    • Use Informatica CDC Drivers to process only Changed Data
    • Filter Data as early in the Pipelines
    • Limit the Data via Equi-Joins (JNR) upfront before Left-Joins (JNR) on Large Tables
  • DB Perspective
    • Build Indexes (High Volume Tables on Frequently used Joins/Predicates)
    • Create DB Partitions (for Large Fact Tables)
    • Collect STATISTICS
    • DB performs Faster processing (Complex Transformation Logic) than ETL
  • Delegation Perspective
    • Use PDO if DB Server has appreciable Computing Abilities
    • If DB Server has High Workload, push Functions Logic to Informatica Transformations
    • If DB has difficulty with Aggregations/Sorting, use Informatica Transformations
  • Space Perspective
    • Have a Retention period for Log/Cache files
    • Increase SRT/AGG/JNR Cache Size and DTM Buffer Size
  • Transformations/Load Perspective
    • Sorted Input data before LKP/AGG/JNR Transformations
    • JNR with Master Source having less records and distinct values
    • Consider BULK Load and External Loaders for Data Dump (after removing Index)
    • Use LKP Persistent Cache for re-use requirements
    • Datatype consistency helps ETL operating with SRT, AGG, and JNR
    • Optimize LKPs by looking up only relevant data (Override Filters) instead of the entire table
    • Avoid LKP Override sort for small tables
    • Use UPD Strategy Transformation (only if necessary), can go for session-level updates
    • If LKP is on a high volume table and causes performance issues, consider JNR Transformation

Additional Tuning Practices

Additional Tuning Practices are listed below :

  • Use Informatica Partitions (Pass Through/Key Range/Hash Key etc.) if the data volume is high
  • Do not use SRC and TGT as the same DB Table. Do an SRC – File TGT. Then FILE – DB TGT
  • Do not perform all ETL Operations in 1 Mapping. Divide ETL works with a Series of Mappings
  • Use Concurrent Workflow Exec setting to enable parallel loads with different Params
  • Process ETL in multiple batches (ex. 2 times a day) to release the Table post load
  • If Complex ETL logic causes slowness, use FILE as TGT. Then 1:1 Load from FILE-TGT DB
  • Monitor Storage Space (Logs), and use ETL Automation to clear files by Frequency (Mly and Qly)

Conclusion

On a high level, below are the inferences :

  • Tuning need not be performed on every ETL mapping. Only those ETL jobs that are pain points to meeting Data Extraction and Loads SLAs be considered potential candidates for further investigations and tuning.
  • DB Query optimization also plays a crucial role with SQL Overrides when used.
  • Delegate load b/w DB and ETL Servers.
  • Optimize ETL Design by following the Best Practices.
  • Monitor Storage Space and Computing Abilities.
  • Consider deploying Informatica Nodes on a GRID for High Availability and Load Balancing.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Rajesh Ranga Rao

Rajesh has over a decade and a half of experience in managing data warehousing & data analytics BUs with ETL DEV and support experience in banking and healthcare domains offering data analysis, transformation, and loading to support the BI requirements of the clients.

More from this Author

Follow Us