Why You Need Grid Configuration in Datastage / Blogs / Perficient

Prior to running your project on a grid system, you must ensure that your grid environment is configured.

Why Do You Need Grid Configuration in Datastage?

Grid computing will enhance the performance of the server through maximum utilization of compute nodes to one or more projects simultaneously.
Enables both grid distribution methods simultaneously
Allows you to assign jobs to specific servers in thegrid
Allows you to assign a parallel job to run across multiple servers

Platforms you can use:
Redhat / SusE
AIX/Power

Why is the Data integration Grid Driving Rapid Customer Adoption?

You can make better decisions when you have better data yields.

Grid-based integration makes it possible for companies to process and analyze larger data volumes, create a consolidated view of data, and put the right data into the enterprise data warehouse and other critical enterprise applications
More sources of data, more data from each source, better matching, real-time versus batch
Better business decisions
Enhanced customer relationships
More cross selling and upselling
New services delivered to customers

Reduced Data Integration Costs.

Reduce administration and operating cost –centralization of staff.
Reduced data integration project costs – lower cost per project delivered by data integration center of excellence versus siloed projects.
Reduced hardware cost.

What are the Benefits of Grid Computing?

Low cost hardware
High-throughput processing
Resource manager monitors availability of hardware at startup / job deployment time
SLA (Service Level Agreement) – It have consistent run times and isolates job concurrent execution.

Comparison of Before and After Grid Configuration

Before Grid:

Build an AI-First Enterprise

From early pilots to enterprise-wide deployment, our award-winning AI consulting and technical services help you build the right foundation, scale responsibly, and deliver meaningful business outcomes.

Learn More

Architecture & proliferation of SMP servers:
• Higher capital costs through limited pooling of IT assets across silos
• Higher operational costs
• Limited responsiveness due to more manual scheduling and provisioning
• Inherently more vulnerable to failure
• No ability to exploit available capacity when other teams are idle

After Grid:

“Virtualized” infrastructure:
• Creates a virtual data integration collaboration environment
• Virtualizes application services execution
• Dynamically fulfills requests over a virtual pool of system resources (nodes)
• Offers an adaptive, self-managed operating environment that guarantees high availability
• Delivers maximum available capacity to anyone participating in the grid

Grid Environmental Variable:
APT_GRID_ENABLE
• YES: Current osh will intercept the run script to create a new configuration file
• NO: Use the existing configuration file
APT_GRID_QUEUE
• Name of the Resource Manager queue the job will be submitted to
APT_GRID_COMPUTE_NODES
• The number of compute nodes required for the job
• Used to request the number of compute nodes in the dynamically created configuration file
• A compute node is a server that can be used for processing
• Not e.g. dedicated for IO or DB2
• Default value is 1
APT_GRID_PARTITIONS
• Used to create multiple partitions for each compute node • Default value is 1
Resource Management
• Tracks resources (nodes) based on which jobs are already running, which servers are down
• Queues jobs when no resources are available
• Provides a list of nodes that are assigned for a job
• Extensive advanced features
• We leverage a subset of the features
• Manager node where tasks are scheduled and resources allocated
• Usually happens on the head node
• Compute nodes have agent processes that communicate back to the manager
• Jobs (scripts or executables) are started on compute node, not head node
Grid Enable Tool kit:
What does it do?
• Prebuilt integration with resource managers
• Coordinates activities between the parallel framework and the resource manager
• Creates the parallel configuration file to drive the dynamic assignment of compute resources
• Logging (interaction w/ RM, usage details)

Workflow of GRID:

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Why You Need Grid Configuration in Datastage

by Jayanth Kaliappan on July 11th, 2016 | ~ minute read

Why Do You Need Grid Configuration in Datastage?

Why is the Data integration Grid Driving Rapid Customer Adoption?

You can make better decisions when you have better data yields.

Reduced Data Integration Costs.

What are the Benefits of Grid Computing?

Comparison of Before and After Grid Configuration

Build an AI-First Enterprise

Leave a Reply

Jayanth Kaliappan

Categories

Follow Us