How to Establish an Amazon Redshift Connection in DataStage

Amazon Redshift is a data warehouse, which allows us to connect through standard SQL based clients and business intelligence tools effectively. It delivers fast query performance by using row-wise data storage by executing the queries parallel in a cluster on multiple nodes.

The primary focus of this blog is to establish the connection between AWS Redshift Database and IBM Datastage

Pre-requisites:

IBM InfoSphere DataStage Quality Stage Designer – v 9.1.0
Create an account in AWS and configure Redshift DB, refer to this link to configure
Download AWS Redshift DB Driver here

Step-by-Step process:

Step 1: To connect AWS Redshift Database in Datastage, use the JDBC Connector which is available under the Database section in the palette.

Create a new file and name it as config file under $DSHOME (/opt/IBM/InformationServer/Server/DSEngine) path.

Note: If we had already connected any database using JDBC connector, then the file would be existing already. We need to edit the existing file in this case.

Add the downloaded AWS Redshift DB Driver (RedshiftJDBC4-1.2.1.1001.jar) file in any path and mention that file’s path in CLASSPATH parameter. In CLASS_NAMES parameter, we need to mention the class name, which is available in the jar file.

Note: If we want to add more jar files or more class names, the jar file paths or class names should be separated by a semicolon (;)

The below screenshot will show that the downloaded jar file is placed in the path mentioned in isjdbc.config

Step 2: Develop a Datastage job by having a JDBC connector (available under Database section in palette) as the source or Target

Create a new parallel job with the JDBC Connector as source.
Open the JDBC connector and fill the JDBC URL in the URL section and fill the User name/password & Table name like below,

Build an AI-First Enterprise

From early pilots to enterprise-wide deployment, our award-winning AI consulting and technical services help you build the right foundation, scale responsibly, and deliver meaningful business outcomes.

Learn More

3. The JDBC URL will be available in the Cluster Database Properties in the AWS console. The JDBC URL will be like below,

jdbc:redshift://<ServerName>:<Port>/<Database Name>
Server Name: It will differ for every individual connection
Port number: 5439 (Common for all Redshift DB)
Database Name: redshiftdemo (I have created during the configuration)

Once the parameters are filled in the JDBC Connector, test the connection like below

Note: Check whether the AWS Redshift is running or not, before testing the connection

We can view the data in datastage, once we establish our connection successfully

Step 3: Now the connection is established successfully and we can develop our job with the stages needed. Below, I’ve created a simple mapping with copy stage and sequential file as our target. The below job is successfully completed and 8 records from the source are exported and the same number of records have been loaded into the sequential file

Step 4: Check the target to see if the records are loadeded successfuly or not like below,

Troubleshooting:

java.lang.UnsupportedClassVersionError: bad major version at offset=6 –

Cause: The version which is used to compile the jar file is different from jre which is available in your Unix machine

Solution: Use lower version of the jar file or compile with same java runtime environment

The driver configuration file isjdbc.config could not be found –

Cause: This will happen if we don’t place the isjdbc.config file in $DSHOME path

Solution: We have to place the isjdbc.config file in $DSHOME path (/opt/IBM/InformationServer/Server/DSEngine)

I hope, this blog would help you to configure Redshift in DataStage and I will come soon with another interesting blog…!

*************************************************************************************************************

Thoughts on “How to Establish an Amazon Redshift Connection in DataStage”

Vishal June 24, 2018 at 2:27 pm

Good explanation of AWS connector in Datastage and can u explain on S3 and Ec2 usage in datastage.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

How to Establish an Amazon Redshift Connection in DataStage

by Vigneshpandi Marimuthu on June 9th, 2017 | ~ minute read

Build an AI-First Enterprise

Thoughts on “How to Establish an Amazon Redshift Connection in DataStage”

Leave a Reply

Vigneshpandi Marimuthu, Technical Consultant

Categories

Follow Us