An Error Handling Approach in Datastage Project / Blogs / Perficient

In year of 2012 I was involved in a datastage ETL project to load data from external flat files, as well as Oracle CRM data source into Teradata data warehouse. Comparing to other ETL appliances, IBM Datastage offers much flexible stages, components to accommodate each type of data feed business requirement. I would like to introduce an error handler solution utilized in the project for your reference.

Requirement

The administrator or power users will be monitoring ETL data flows and determine which stages or file run into error in the periodic jobs. Each individual data jobs will generate necessary error information file with specified format at project level. Now those separate piece of information need to be collected and put into centralized database. Basically the admin user need to know when & where the issue happens, what’s reason etc.

Approach

I believe there can be many solutions to be deployed to achieve the goal. We just wanted to make things simple and easy to implement in Datastage. The overall steps are depicted in following diagram. The assumption is that we have different sequence job with different frequency, in each of which there are couple of parallel jobs. The reason that we are to use Sequential file instead of dataset file is that it is difficult to manipulate dataset file in Unix shell script. In this process,

Choosing a Global Software Development Partner to Accelerate Your Digital Strategy

To be successful and outpace the competition, you need a software development partner that excels in exactly the type of digital projects you are now faced with accelerating, and in the most cost effective and optimized way possible.

Get the Guide

1) Each parallel job will write error information into its individual sequential file.

2) In sequence job it will call each parallel job first and then go to a CommandExecute activity calling Unix shell to cat all files into single one and then copy to a pre-defined folder.

3) A designed error handling job need to be scheduled on the regular basis and watching on that specified folder. Once the sequential file onboard, it read error information and load to Teradata tables.

4) As long as the sequential files are processed without error, it should be move to archive folder. Otherwise, admin will have to look at the error and process it again.

Pros

We don’t have to do manipulation in Datastage dataset file and call error handler data load job in many main sequence jobs so as to reduce degree of code coupling.

Cons

This is asynchronous way and it’s a bit hard to track whether the parallel job error information has been loaded into Teradata or not; And need to consider how to rollback if error out in one of the steps.

An Error Handling Approach in Datastage Project

by Kent Jiang on April 6th, 2013 | ~ minute read

Choosing a Global Software Development Partner to Accelerate Your Digital Strategy

Tags

Leave a Reply

Kent Jiang

Categories

Follow Us