PBS pbs sched Startup Fix After Reboot
Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > CentOS 7.x Rocks Cluster OpenPBS pbs_sched Startup Fix After Reboot
CentOS 7.x Rocks Cluster OpenPBS pbs_sched Startup Fix After Reboot
Background: As per the Rocks cluster architecture, the master node has two hostnames:
- public hostname
- private hostname
In OpenPBS, all scheduler and node communication happens strictly through hostnames. OpenPBS must be configured to use the private hostname for proper communication with compute nodes.
Issue Description: Whenever the Rocks master node is restarted:
- The OpenPBS service starts automatically
- The pbs_sched daemon starts using the public hostname
- As a result, PBS jobs remain in the Q (Queued) state and do not start executing
To resolve this, the pbs_sched service must be started explicitly using the private hostname.
Step 1: Configure Scheduler Clientfile on Master
Since the master node is typically used as a PBS client for job submission, configure the scheduler clientfile.
Edit the clientfile:
vim /var/spool/pbs/sched_priv/clientfile
Add the private hostname of the master node:
$clienthost <private-master-hostname>
Step 2: Stop the Incorrectly Running pbs_sched
Login to the Rocks master node and identify the running scheduler process:
ps aux | grep pbs_sched
Kill the scheduler process:
kill -9 <process-id>
OR kill all scheduler processes:
killall -9 pbs_sched
Step 3: Start pbs_sched Using Private Hostname
Start the PBS scheduler explicitly with the configured clientfile:
/opt/pbs/sbin/pbs_sched -c /var/spool/pbs/sched_priv/clientfile
Result: The scheduler now binds to the private hostname, and PBS jobs move from Q (Queued) to R (Running) state.
Step 4: Create Persistent Restart Script (Recommended)
To ensure correct behavior after every reboot, create a restart script.
Create the script:
vim /root/restart-pbs.sh
Add the following content:
#!/bin/bash systemctl restart pbs killall -9 pbs_sched /opt/pbs/sbin/pbs_sched -c /var/spool/pbs/sched_priv/clientfile
Make the script executable:
chmod +x /root/restart-pbs.sh
Step 5: Run Script Automatically After Reboot
Edit the rc.local file:
vim /etc/rc.d/rc.local
Add the following line:
/root/restart-pbs.sh
Ensure rc.local is executable:
chmod +x /etc/rc.d/rc.local
Usage Note
Always use the following command instead of restarting PBS directly:
/root/restart-pbs.sh
This ensures that the PBS scheduler always starts with the correct (private) hostname.
Reference
OpenPBS Community Documentation:
https://community.openpbs.org/t/proper-way-to-configure-pbs-on-multiple-nic-system/1508/2
Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > CentOS 7.x Rocks Cluster OpenPBS pbs_sched Startup Fix After Reboot