PBS pbs sched Startup Fix After Reboot

From Notes_Wiki
Revision as of 07:38, 29 December 2025 by Akshay (talk | contribs)

Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > PBS pbs_sched Startup Fix After Reboot


CentOS 7.x Rocks Cluster OpenPBS pbs_sched Startup Fix After Reboot

Background: As per the Rocks cluster architecture, the master node has two hostnames:

  • public hostname
  • private hostname

In OpenPBS, all scheduler and node communication happens strictly through hostnames. OpenPBS must be configured to use the private hostname for proper communication with compute nodes.

Issue Description: Whenever the Rocks master node is restarted:

  • The OpenPBS service starts automatically
  • The pbs_sched daemon starts using the public hostname
  • As a result, PBS jobs remain in the Q (Queued) state and do not start executing

To resolve this, the pbs_sched service must be started explicitly using the private hostname.

Step 1: Configure Scheduler Clientfile on Master

Since the master node is typically used as a PBS client for job submission, configure the scheduler clientfile.

Edit the clientfile:

vim /var/spool/pbs/sched_priv/clientfile

Add the private hostname of the master node:

$clienthost <private-master-hostname>

Step 2: Stop the Incorrectly Running pbs_sched

Login to the Rocks master node and identify the running scheduler process:

ps aux | grep pbs_sched

Kill the scheduler process:

kill -9 <process-id>

OR kill all scheduler processes:

killall -9 pbs_sched

Step 3: Start pbs_sched Using Private Hostname

Start the PBS scheduler explicitly with the configured clientfile:

/opt/pbs/sbin/pbs_sched -c /var/spool/pbs/sched_priv/clientfile

Result: The scheduler now binds to the private hostname, and PBS jobs move from Q (Queued) to R (Running) state.

Step 4: Create Persistent Restart Script (Recommended)

To ensure correct behavior after every reboot, create a restart script.

Create the script:

vim /root/restart-pbs.sh

Add the following content:

#!/bin/bash

systemctl restart pbs
killall -9 pbs_sched
/opt/pbs/sbin/pbs_sched -c /var/spool/pbs/sched_priv/clientfile

Make the script executable:

chmod +x /root/restart-pbs.sh

Step 5: Run Script Automatically After Reboot

Edit the rc.local file:

vim /etc/rc.d/rc.local

Add the following line:

/root/restart-pbs.sh

Ensure rc.local is executable:

chmod +x /etc/rc.d/rc.local

Usage Note

Always use the following command instead of restarting PBS directly:

/root/restart-pbs.sh

This ensures that the PBS scheduler always starts with the correct (private) hostname.

Reference

OpenPBS Community Documentation:

https://community.openpbs.org/t/proper-way-to-configure-pbs-on-multiple-nic-system/1508/2

Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > PBS pbs_sched Startup Fix After Reboot