Skip to main content

Integration & IT Modernization

Everything You Need to Know About How Load Leveler Works

What is Load Leveler?

Load leveler in datastage is used to manage the resource space in server and run the scheduled DS jobs according to it.

How is Load Leveler Work Load Management Done?

Load management is used for the following:

  • Job Management:
  • Workload Balancing – (to maximize the use)
  • Control – (Centralized –System admin)
  • Usability – (Command line interface)
  • Supports NFS, DFS, AFS, and GPFS

Job Management includes the following:

    • Build, Submit, Schedule, Monitor
    • Change Priory
    • Terminate

Load Leveler Cluster submits only machines (compute nodes) in it.

What is Included in a Load Leveler Cluster? 

Job Manager/Scheduler Node (public or local)

  • Manages jobs from submission through completion
  • Receives submission from user, sends to Central Manager, schedules jobs

Central Manager

  • Central resource manager and workload balancer
  • Examines requirements and find the resources

Execute Node

  • Runs work (serial job steps or parallel job tasks) dispatched by the Central Manager

Resource Manager

  • Collects status from executing and job manager

Region Manager

  • Monitors node and adaptor status of executing machines

Submit-only Node

  • Submits jobs to LoadLeveler from outside the cluster.

 

Examples of Load Leveler Commands:

To Submit a batch:

#llsubmit file name

Example:

#llsubmit script.ll

 

Queue Information:

Provides information about each queue.

#llclass

Example output:

Name       MaxJobCPU   MaxProcCPU  Free   Max Description

d+hh:mm:ss  d+hh:mm:ss Slots Slots

———– ———- ———- —– —– ———————

interactive undefined   undefined    4     8   Interactive Parallel jobs running on interactive node

workq       unlimited   unlimited    0    56   Default queue, up to 56 processors

preempt     unlimited   unlimited   16    48   queue resevered for on-demand jobs, up to 48 processors

checkpt     unlimited   unlimited   16   104   queue for checkpointing jobs, up to 104 processors, Job

running on this queue can be preempted for on-demand job

——————————————————————————–

“Free Slots” values of the classes “workq”, “preempt”, “checkpt” are constrained by the MAX_STARTERS limit(s)

 

View Job Status:

To list all jobs in queue:

#llq

 

To list a job of a specific user:

# llq –u username

 

To determine why a job has not started:

#llq – s job –id

The class of this job step is “checkpt”.

Total number of available initiators of this class on all machines in the cluster: 8

Minimum number of initiators of this class required by job step: 32

The number of available initiators of this class is not sufficient for this job step.

Not enough resources to start now.

This step is top-dog.

Considered at: Fri Jul 1 12:12:04 2016

Will start by: Tue Jul 1 18:10:32 2016

Generate a long listing rather than the standard one
# llq –l job – id Job Status States
Canceled CA The job has been canceled as by the llcancel command.
Completed C The job has completed.
Complete Pending CP The job is completed. Some tasks are finished.
Deferred D The job will not be assigned until a specified date. The start date may have been specified by the user in the Job Command file or it may have been set by LoadLeveler because a parallel job could not obtain enough machines to run the job.
Idle I The job is being considered to run on a machine though no machine has been selected yet.
NotQueued NQ The job is not being considered to run. A job may enter this state due to an error in the command file or because LoadLeveler can not obtain information that it needs to act on the request.
Not Run NR The job will never run because a stated dependency in the Job Command file evaluated to be false.
Pending P The job is in the process of starting on one or more machines. The request to start the job has been sent but has not yet been acknowledged.
Rejected X The job did not start because there was a mismatch or requirements for your job and the resources on the target machine or because the user does not have a valid ID on the target machine.
Reject Pending XP The job is in the process of being rejected.
Removed RM The job was canceled by either LoadLeveler or the owner of the job.
Remove Pending RP The job is in the process of being removed.
Running R The job is running.
Starting ST The job is starting.
Submission Error SX The job can not start due to a submission error. Please notify the Bluedawg administration team if you encounter this error.
System Hold S The job has been put in hold by a system administrator.
System User Hold HS Both the user and a system administrator has put the job on hold.
Terminated TX The job was terminated, presumably by means beyond LoadLeveler’s control. Please notify the Bluedawg administration team if you encounter this error.
User Hold H The job has been put on hold by the owner.
Vacated V The started job did not complete. The job will be scheduled again provided that the job may be reschellued.
Vacate Pending VP The job is in the process of vacating.

To cancel a job:

# llcancel job –id

# llcancel job -u username

Job History and Usage Summaries

# llsummary -u estrabd /var/loadl/archive/history.archive

Check status of each node

# llstatus

$ llstatus

Name                     Schedd  InQ Act Startd Run LdAvg Idle Arch      OpSys

tstdevn01                   Avail     4   2 Idle     0 1.01     0 Power5    AIX53

tstdevn02                   Down      0   0 Busy     8 8.31  9999 Power5    AIX53

tstdevn03                   Down      0   0 Idle     0 0.00  9999 Power5    AIX53

tstdevn04                  Down      0   0 Idle     0 0.01  9999 Power5    AIX53

tstdevn05                   Down      0   0 Busy     8 7.73  9999 Power5    AIX53

tstdevn06                   Down      0   0 Busy     8 9.03  9999 Power5    AIX53

tstdevn07                  Down      0   0 Busy     8 7.98  9999 Power5    AIX53

tstdevn08                   Down      0   0 Busy     8 9.01  9999 Power5    AIX53

tstdevn09                   Down      0   0 Busy     8 8.73  9999 Power5    AIX53

tstdevn10                   Down      0   0 Busy     8 8.00  9999 Power5    AIX53

tstdevn11                   Down      0   0 Idle     0 1.04  9999 Power5    AIX53

tstdevn12                   Down      0   0 Idle     0 0.00  9999 Power5    AIX53

tstdevn13                   Down      0   0 Idle     0 0.00  9999 Power5    AIX53

tstdevn14                   Down      0   0 Busy     8 8.07  9999 Power5    AIX53

Power5/AIX53               14 machines      4  jobs     64  running

Total Machines             14 machines      4  jobs     64  running

The Central Manager is defined on tstdevn01

The BACKFILL scheduler is in use

All machines on the machine_list are present.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Jayanth Kaliappan

More from this Author

Follow Us