Skip to main content

Data & Intelligence

How to Establish an Amazon Redshift Connection in DataStage

Amazon Redshift is a data warehouse, which allows us to connect through standard SQL based clients and business intelligence tools effectively. It delivers fast query performance by using row-wise data storage by executing the queries parallel in a cluster on multiple nodes.

The primary focus of this blog is to establish the connection between AWS Redshift Database and IBM Datastage

Pre-requisites:

  • IBM InfoSphere DataStage Quality Stage Designer – v 9.1.0
  • Create an account in AWS and configure Redshift DB, refer to this link to configure
  • Download AWS Redshift DB Driver here

Step-by-Step process:

Step 1: To connect AWS Redshift Database in Datastage, use the JDBC Connector which is available under the Database section in the palette.

  1. Create a new file and name it as config file under $DSHOME (/opt/IBM/InformationServer/Server/DSEngine) path.

      Note: If we had already connected any database using JDBC connector, then the file would be existing already. We need to edit the existing file in this case.How to Establish an Amazon Redshift Connection in DataStage

 

 

  1. Add the downloaded AWS Redshift DB Driver (RedshiftJDBC4-1.2.1.1001.jar) file in any path and mention that file’s path in CLASSPATH parameter. In CLASS_NAMES parameter, we need to mention the class name, which is available in the jar file.

       Note: If we want to add more jar files or more class names, the jar file paths or class names should be separated by a semicolon (;)

How to Establish an Amazon Redshift Connection in DataStage

  1. The below screenshot will show that the downloaded jar file is placed in the path mentioned in isjdbc.config

How to Establish an Amazon Redshift Connection in DataStage

Step 2: Develop a Datastage job by having a JDBC connector (available under Database section in palette) as the source or Target

  1. Create a new parallel job with the JDBC Connector as source.
  2. Open the JDBC connector and fill the JDBC URL in the URL section and fill the User name/password & Table name like below,
Data Intelligence - The Future of Big Data
The Future of Big Data

With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

Get the Guide

How to Establish an Amazon Redshift Connection in DataStage

3. The JDBC URL will be available in the Cluster Database Properties in the AWS console. The JDBC URL will be like below,

  • jdbc:redshift://<ServerName>:<Port>/<Database Name>
  • Server Name: It will differ for every individual connection
  • Port number: 5439 (Common for all Redshift DB)
  • Database Name: redshiftdemo (I have created during the configuration)

How to Establish an Amazon Redshift Connection in DataStage

  1. Once the parameters are filled in the JDBC Connector, test the connection like below

     Note: Check whether the AWS Redshift is running or not, before testing the connection

How to Establish an Amazon Redshift Connection in DataStage

  1. We can view the data in datastage, once we establish our connection successfully

How to Establish an Amazon Redshift Connection in DataStage

Step 3: Now the connection is established successfully and we can develop our job with the stages needed. Below, I’ve created a simple mapping with copy stage and sequential file as our target. The below job is successfully completed and 8 records from the source are exported and the same number of records have been loaded into the sequential file

How to Establish an Amazon Redshift Connection in DataStage

Step 4: Check the target to see if the records are loadeded successfuly or not like below,

How to Establish an Amazon Redshift Connection in DataStage

Troubleshooting:

How to Establish an Amazon Redshift Connection in DataStage

Cause: The version which is used to compile the jar file is different from jre which is available in your Unix machine

Solution: Use lower version of the jar file or compile with same java runtime environment

How to Establish an Amazon Redshift Connection in DataStage

Cause: This will happen if we don’t place the isjdbc.config file in $DSHOME path

Solution: We have to place the isjdbc.config file in $DSHOME path (/opt/IBM/InformationServer/Server/DSEngine)

I hope, this blog would help you to configure Redshift in DataStage and I will come soon with another interesting blog…!

*************************************************************************************************************

Thoughts on “How to Establish an Amazon Redshift Connection in DataStage”

  1. Good explanation of AWS connector in Datastage and can u explain on S3 and Ec2 usage in datastage.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Vigneshpandi Marimuthu, Technical Consultant

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram