Difference between revisions of "CentOS 7.x Rocks cluster 7.0 Reinstall OS on compute node"

From Notes_Wiki
(Created page with "Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > CentOS 7.x Rocks cluster 7.0 Reinstall OS on compute node To reinstall OS on compute node use: <pre> rocks list host boot rocks set host boot <hostname> action=install ssh <hostname> "shutdown -r now" </pre> This assumes that the boot order on compute node is properly set to boot from network. '''By default there is /state/partition1 partition created on compute nodes. This part...")
 
m
 
Line 1: Line 1:
[[Main Page|Home]] > [[CentOS]] > [[CentOS 7.x]] > [[CentOS 7.x Rocks cluster 7.0]] > [[CentOS 7.x Rocks cluster 7.0 Reinstall OS on compute node]]
[[Main Page|Home]] > [[CentOS]] > [[CentOS 7.x]] > [[CentOS 7.x Rocks cluster 7.0]] > [[CentOS 7.x Rocks cluster 7.0 Reinstall OS on compute node]]


=Reinstall OS on one specific compute node=
To reinstall OS on compute node use:
To reinstall OS on compute node use:
<pre>
<pre>
Line 13: Line 14:
Refer:
Refer:
* http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/x2105.html
* http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/x2105.html
=Reinstall OS on all compute nodes=
If the reinstallation has to be done on all compute nodes then use:
# You must have a non-root user.  If not there create one with useradd
#: Note we cannot run sge jobs as root user
# The non-root user must have manager privilege.  If not there add via:
#:<pre>
#:: qconf -am <username>
#:</pre>
#: This is required because jobs with positive priority can be submitted only by managers.
# Edit '<tt>/opt/gridengine/examples/jobs/sge-reinstall.sh</tt>' and replace the qsub line with (might have been split into two lines):
#:<pre>
#:: runuser -l <non-root-username> -c "qsub -p 1024 -pe mpi $numprocs -q all.q@$TARGETHOST /opt/gridengine/examples/jobs/reboot.qsub"
#:</pre>
# Now run the script to submit job that configures each node host action as install
#:<pre>
#:: /opt/gridengine/examples/jobs/sge-reinstall.sh
#:</pre>
# Validate that host action has updated properly
#:<pre>
#:: rocks list host boot
#:</pre>
# Restart the nodes using:
#:<pre>
#:: for A in $(rocks list host | cut -f 1 -d ' ' | grep -v HOST | sed 's/.$//' | grep -v <master-hostname>); do ssh $A "shutdown -r now"; done
#:</pre>
#: Ensure to replace &lt;master-hostname&gt; with proper name to avoid rebooting of master itself
# If for one or two nodes reinstallation is not desired we can always change their boot action using:
#:<pre>
#:: rocks set host boot <hostname> action=os
#:: rocks list host boot
#:</pre>
Refer:
* http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/sge-cluster-reinstall.html
* https://docs.oracle.com/cd/E19957-01/820-0698/6ncdvjclp/index.html
* https://stackoverflow.com/questions/37733095/unable-to-run-jobs-on-cfncluster
* https://stackoverflow.com/questions/30645020/what-does-sge-mean-by-positive-submission-priority-requires-operator-privileges





Latest revision as of 07:52, 11 May 2022

Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > CentOS 7.x Rocks cluster 7.0 Reinstall OS on compute node

Reinstall OS on one specific compute node

To reinstall OS on compute node use:

rocks list host boot
rocks set host boot <hostname> action=install
ssh <hostname> "shutdown -r now" 

This assumes that the boot order on compute node is properly set to boot from network.

By default there is /state/partition1 partition created on compute nodes. This partition is not affected during the reinstall process. Any data on this partition remains as it is after the reinstallation.

Refer:


Reinstall OS on all compute nodes

If the reinstallation has to be done on all compute nodes then use:

  1. You must have a non-root user. If not there create one with useradd
    Note we cannot run sge jobs as root user
  2. The non-root user must have manager privilege. If not there add via:
    qconf -am <username>
    This is required because jobs with positive priority can be submitted only by managers.
  3. Edit '/opt/gridengine/examples/jobs/sge-reinstall.sh' and replace the qsub line with (might have been split into two lines):
    runuser -l <non-root-username> -c "qsub -p 1024 -pe mpi $numprocs -q all.q@$TARGETHOST /opt/gridengine/examples/jobs/reboot.qsub"
  4. Now run the script to submit job that configures each node host action as install
    /opt/gridengine/examples/jobs/sge-reinstall.sh
  5. Validate that host action has updated properly
    rocks list host boot
  6. Restart the nodes using:
    for A in $(rocks list host | cut -f 1 -d ' ' | grep -v HOST | sed 's/.$//' | grep -v <master-hostname>); do ssh $A "shutdown -r now"; done
    Ensure to replace <master-hostname> with proper name to avoid rebooting of master itself
  7. If for one or two nodes reinstallation is not desired we can always change their boot action using:
    rocks set host boot <hostname> action=os
    rocks list host boot

Refer:



Home > CentOS > CentOS 7.x > CentOS 7.x Rocks cluster 7.0 > CentOS 7.x Rocks cluster 7.0 Reinstall OS on compute node