Skip to main content

Development

Performance Tuning on CLOB Data in Datastage

 

Data integration processes are very time and resource consuming.  The amount of data and the size of the datasets are constantly growing but data and information are still expected to be delivered on-time.  Performance is therefore a key element in the success of a Business Intelligence & Data Warehousing project. Performance tuning should be the primary focus while building an effective data warehouse.

There are many ETL tools available in the market and this blog focuses on how to do performance tuning in Datastage while handling CLOB data.

Following are the common job design factors that impact the performance of a job in Data stage

• Repartitioning
• More transformation logic and look up operations
• Performing lot of DML operations (Update , Insert, Delete) in a job
• Frequent updates to CLOB and BLOB data types

Performance Tuning on the CLOB datatype

In this section, we will work on performance tuning by taking up one of the issues discussed in the previous section.
• A CLOB (character large object) value can be up to 2,147,483,647 characters long.  The performance of the job will be affected when there are frequent updates to the CLOB data in database through data stage. Let us see how we can easily resolve this issue in data stage in this blog.

Following are the steps to be followed for development and testing:

• Considering the CDC  scenario as our example, the source and target information is listed below.

Source                              :- Desktop_details (Oracle DB)
Transformation_Stage  :- CDC Stage
Target                              :- Desktop_tgt (Oracle DB)

Scenario A :
Prepare a parallel job with CLOB data type column and run it in Infosphere Data Stage using parallel job option.
Note :- While running a parallel job with CLOB datatype, Array size property in the ODBC connector should be set as ‘1’

Capture1

 

Capture2

Result:
Time taken to run the job : 25 seconds.

Drawback: Performance Issue
It is taking more time to run the job,which is in turn leading to performance issue.

Solution:-
Split the above job into two as below,
1) Job without having CLOB data type column
2) Job having only CLOB data type column

Capture3

Now run the above split jobs in Infosphere Data Stage using Sequence job option

 

Capture4

 

Time taken to run the job : 13 seconds.

  • Advantages
    By this approach, Performance tuning can be improved by almost reducing the run time of the job  up to 50% .

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Chinthala RaghavendraNivas

More from this Author

Categories
Follow Us
TwitterLinkedinFacebookYoutubeInstagram