PBS job submission and execution

From Notes_Wiki

Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > PBS job submission and execution


Sample pbs stress job for testing

We can test PBS installation by validating whether we are able to submit jobs and they are executing properly on various nodes using:

  1. Install stress package on all the nodes
    yum install -y stress
    Assumes epel, etc. are enabled as part of compute/execution node setup
  2. Create the PBS script 'load.sh' to run workloads on 8 specific nodes using:
    #! /bin/bash
    
    #PBS -N jobname
    #PBS -q queuename
    #PBS -V
    #PBS -l nodes=8
    
    /opt/openmpi/bin/mpirun -host compute-0-0,compute-0-1,compute-0-2,compute-0-3,compute-0-4,compute-0-5,compute-0-6,compute-0-7  -n 8 /usr/bin/stress --cpu 20 --vm 5 --vm-bytes 10G --timeout 60s
    
    Note: This script will use 20 cpus, around 50G memory and run for one minute in 8 nodes
  3. Other better option is to use below load.sh script instead
    #!/bin/bash
    
    #PBS -N job
    #PBS -q testq1
    #PBS -V
    #PBS -l nodes=2:ppn=2
    
    /opt/openmpi/bin/mpirun -machinefile $PBS_NODEFILE -np 4 /usr/bin/stress --cpu 1 --vm-bytes 10G --timeout 60s
    
    This allows pbs to pick 2 nodes (nodes=2) and run 2 process per node (ppn=2) among these nodes for total of 4 processes (-np 4). This way we dont need to provide hostnames in the script file.
  4. If submittion of jobs as root is not enabled (It is disabled by default, recommended to leave it disabled), then we need to create common users across all nodes for the job submission process to work.
    For example
    useradd user1
    passwd user1
    rocks sync users
  5. Also the users created should have ssh-keybased password-less ssh from one machine to another machines for the same user. In case of rocks this is not working by default unless we do following (Required only once on master)
    ls -l /usr/libexec/openssh/ssh-keysign
    chmod u+s /usr/libexec/openssh/ssh-keysign
    ls -l /usr/libexec/openssh/ssh-keysign
  6. Submit the jobs in multiple nodes using normal user eg user1 as:
    qsub load.sh
    This assume /opt/pbs/bin is in $PATH before /opt/gridengine/bin
  7. Check the CPU/RAM utilization on the compute nodes
    top
  8. OR for better graphics use:
    yum -y install htop
    htop


Email status of PBS job execution

For sending mails after/before job execution, Postfix or SMTP should be configured in the master server.

Normal emails

Below two parameters should be added in the pbs job script (ex: load.sh) along with other parameters.

#PBS -m abe
#PBS -M pavan@gbb.co.in

Explanation of " abe "

a
Mail is sent when the job is aborted by the batch system.
b
Mail is sent when the job begins execution.
e
Mail is sent when the job terminates.


Exception emails

If mails are not required from particuler job, then change the below parameter in the pbs job script.

#PBS -m n

Where:

n
No normal mail is sent. Mail for job cancels and other events outside of normal job processing are still sent.


Configure from email address

Change the mail from name by modifying below server attribute

qmgr -c "set server mail_from = <user>@<domain>"


More ways to submit jobs

Basic Job submission

qsub <script-name>

For Example

qsub /home/test5/myscript.sh


Specify job name with -N option while submitting the job

qsub -N <job-name> <script>

For Example:

qsub -N firstJob /home/test5/myscript.sh


Select resources while submitting jobs

qsub -l ncpus=<cpu-=count>:mem=<mem-count-in-gb>gb <script>

For example:

qsub -l ncpus=20:mem=40gb /home/test5/myscript.sh

This example Job will select 20 cpus and 40gb memory

Select single node while submitting jobs

qsub -l nodes=<nodename1>:ncpus=20 <script>

For Example:

qsub -l nodes=rockscompute1:ncpus=20 /home/test5/myscript.sh

This job will select one node specified with hostname.

Select multiple nodes while submitting jobs

qsub -l nodes=<nodename1>+<nodename2>:ncpus=20 <script>

For Example:

qsub -l nodes=rockscompute1+rockscompute2:ncpus=20 /home/test5/myscript.sh


Submit multiple jobs with same script

qsub -J 1-20 /home/test5/myscript.sh



Check Job status

  • To print all running jobs:
    qstat -a
  • To print all finished jobs:
    qstat -x
  • To see job attributes:
    qstat -f <job ID>
  • To see job attributes when history is enabled use:
    qstat -xf <job-id>



Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > PBS job submission and execution