PBS job queues resource management

From Notes_Wiki
Revision as of 09:48, 22 July 2022 by Saurabh (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > PBS job queues resource management

In order to restrict amount of resources available to users we need to do resource management on queues and assign users / nodes or groups to queues.

Resource management for the overall Queue

  • Set Maximum CPU for the queue
    qmgr -c "set queue testq1 resources_max.ncpus=20"
  • Set Maximum Memory for queue
    qmgr -c "set queue testq1 resources_max.mem=5gb"
  • Existing limits can also be removed via:
    qmgr -c "unset queue testq1 resources_max.ncpus"
    qmgr -c "unset queue testq1 resources_max.mem"


Adding nodes to queues

This can be done if we want to restrict nodes to specific queue. Ideally this is not required if queues are configured at user level or group level as explained later:

  • Add node to a specific queue
    qmgr -c "set node node-name queue=queue-name"
  • Remove nodes from a specific queue
    qmgr -c "unset node node-name queue"


User resource management for the Queue

  • Enable user restrictions for the queue
    qmgr -c "set queue testq1 acl_user_enable = True"
  • Add users to the queue
    qmgr -c 'set queue testq1 acl_users = "user1, user2"'
  • Add more users to the queue
    qmgr -c 'set queue testq1 acl_users += "user3, user7"'
  • Remove single user from the queue
    qmgr -c "set queue testq1 acl_users -= user3"
  • Remove all users from the queue
    qmgr -c "unset queue testq1 acl_users"
  • Disable user restrictions for the queue
    qmgr -c "unset queue testq1 acl_user_enable"


Set / Unset Maximum CPU per user

  • Set Maximum CPU for single user
    qmgr -c "set queue testq1 max_run_res.ncpus = [u:user3=10]"
  • Set Maximum CPU for multiple users
    qmgr -c 'set queue testq1 max_run_res.ncpus = "[u:user2=10],[u:user3=12]"'
  • If Maximum CPU set for users and need to add more users to the same rule
    qmgr -c "set queue testq1 max_run_res.ncpus += [u:user9=10]"
  • Remove Max CPU for the user
    qmgr -c "set queue testq1 max_run_res.ncpus -= [u:user9=10]"


Set / Unset Maximum Memory per user

  • Set Maximum Memory for single user
    qmgr -c "set queue testq1 max_run_res.mem = [u:user1=4g]"
  • Set Maximum Memory for multiple users
    qmgr -c 'set queue testq1 max_run_res.mem = "[u:user1=4g],[u:user2=3g]"'
  • If Maximum Memory set for users and need to add more users to the same rule
    qmgr -c "set queue testq1 max_run_res.mem += [u:user9=4g]"
  • Remove Max memory for the user
    qmgr -c "set queue testq1 max_run_res.mem -= [u:user9=4g]"


User level resource restriction example

Create test queue using:

qmgr -c "create queue testq1 queue_type=execution"
qmgr -c "set queue testq1 started=true"
qmgr -c "set queue testq1 enabled=true"
qmgr -c "set queue testq1 resources_default.nodes=1"
qmgr -c "set queue testq1 resources_default.walltime=3600"

qmgr -c "set queue testq1 resources_max.ncpus=3"
qmgr -c "set queue testq1 resources_max.mem=2gb"

qmgr -c "set queue testq1 acl_user_enable = True"
qmgr -c 'set queue testq1 acl_users = "user1"'

Then if we run following as user1, then we get error of insufficient resources in queue.

#!/bin/bash

#PBS -N job
#PBS -q testq1
#PBS -V
#PBS -l nodes=2:ppn=2

/opt/openmpi/bin/mpirun -machinefile $PBS_NODEFILE -np 4 /usr/bin/stress --cpu 1 --vm-bytes 10G --timeout 60s


Group resource management for the queue

On the same queue we should not apply both user level and group level limit. Only one of the two should be used.

  • Enable Group restriction for the queue
    qmgr -c "set queue testq1 acl_group_enable = True"
  • Add group to the queue
    qmgr -c "set queue testq1 acl_groups = group1"
  • Add more groups to the queue
    qmgr -c "set queue testq1 acl_groups += group2"
  • Remove group from the queue
    qmgr -c "set queue testq1 acl_groups -= group2"
  • Disable Group restriction for the queue
    qmgr -c "unset queue testq1 acl_group_enable"


Set / Unset Maximum CPU at group level

  • Set Maximum CPU for single group
    qmgr -c "set queue testq1 max_run_res.ncpus = [g:group1=10]"
  • Set Maximum CPU for multiple groups
    qmgr -c 'set queue testq1 max_run_res.ncpus = "[g:group1=10],[g:group2=12]"'
  • Add more groups to Maximum CPU rule
    qmgr -c "set queue testq1 max_run_res.ncpus += [g:group3=5]"
  • Remove Max CPU for the group
    qmgr -c "set queue testq1 max_run_res.ncpus -= [g:group3=5]"


Set / Unset Maximum Memory at group level

  • Set Maximum Memory for single group
    qmgr -c "set queue testq1 max_run_res.mem = [g:group1=4g]"
  • Set Maximum Memory for multiple group
    qmgr -c 'set queue testq1 max_run_res.mem = "[g:group1=4g],[g:group2=3g]"'
  • Add more groups to Maximum Memory rule
    qmgr -c "set queue testq1 max_run_res.mem += [g:group9=4g]"
  • Remove Max memory for the group
    qmgr -c "set queue testq1 max_run_res.mem -= [g:group9=4g]"



Setting maximum execution time (walltime) limits for user or queue

We can configure walltime limit on queue and user level as follows:.

Below is the walltime limit configuration for the queue

qmgr -c "set queue testq1 resources_max.walltime = 00:01:00"

Below is the walltime limit configuration for the user

qmgr -c "set queue testq1 max_run_res.walltime = [u:user1=60]"

For some reason same walltime limit is not working at group level:

qmgr -c "set queue testq2 max_run_res.walltime = [g:group2=3600]"


Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > PBS job queues resource management