Data integration processes are very time and resource consuming. The amount of data and the size of the datasets are constantly growing but data and information are still expected to be delivered on-time. Performance is therefore a key element in the success of a Business Intelligence & Data Warehousing project. Performance tuning should be the primary focus while building an effective data warehouse.
There are many ETL tools available in the market and this blog focuses on how to do performance tuning in Datastage while handling CLOB data.
Following are the common job design factors that impact the performance of a job in Data stage
• Repartitioning
• More transformation logic and look up operations
• Performing lot of DML operations (Update , Insert, Delete) in a job
• Frequent updates to CLOB and BLOB data types
Performance Tuning on the CLOB datatype
In this section, we will work on performance tuning by taking up one of the issues discussed in the previous section.
• A CLOB (character large object) value can be up to 2,147,483,647 characters long. The performance of the job will be affected when there are frequent updates to the CLOB data in database through data stage. Let us see how we can easily resolve this issue in data stage in this blog.
Following are the steps to be followed for development and testing:
• Considering the CDC scenario as our example, the source and target information is listed below.
Source :- Desktop_details (Oracle DB)
Transformation_Stage :- CDC Stage
Target :- Desktop_tgt (Oracle DB)
Scenario A :
Prepare a parallel job with CLOB data type column and run it in Infosphere Data Stage using parallel job option.
Note :- While running a parallel job with CLOB datatype, Array size property in the ODBC connector should be set as ‘1’
Result:
Time taken to run the job : 25 seconds.
Drawback: Performance Issue
It is taking more time to run the job,which is in turn leading to performance issue.
Solution:-
Split the above job into two as below,
1) Job without having CLOB data type column
2) Job having only CLOB data type column
Now run the above split jobs in Infosphere Data Stage using Sequence job option
Time taken to run the job : 13 seconds.
- Advantages
By this approach, Performance tuning can be improved by almost reducing the run time of the job up to 50% .