Difference between revisions of "Common setup for all HPC nodes"

From Notes_Wiki
Line 1: Line 1:
[[Main Page|Home]] > [[Ubuntu]] > [[Ubuntu HPC setup with slurm and linux containers]] > [[Common setup for all HPC nodes]]
[[Main Page|Home]] > [[Ubuntu]] > [[Ubuntu HPC setup with slurm and linux containers]] > [[Common setup for all HPC nodes]]


= 1. Install Ubuntu 22.04 =
= Install Ubuntu 22.04 =


: Install Ubuntu 22.04 Server on all nodes with most default values.
: Install Ubuntu 22.04 Server on all nodes with most default values.
: Create a non-root user (e.g., admin) during setup.
: Create a non-root user (e.g., admin) during setup.


== 2. Login Using the Admin User ==
== Login Using the Admin User ==


: Login to each node using the admin user.
: Login to each node using the admin user.


== 3. Configure History Retention ==
== Configure History Retention ==


: Enable storing date and time along with each command in history, as explained in the guide:
: Enable storing date and time along with each command in history, as explained in the guide:
[[Storing date / time along with commands in history]]
[[Storing date / time along with commands in history]]


== 4. Install Essential Packages ==
== Install Essential Packages ==
<pre> sudo su - apt update apt -y install openssh-server vim htop stress munge </pre>
<pre> sudo su - apt update apt -y install openssh-server vim htop stress munge </pre>


== 5. Set Root Password ==
== Set Root Password ==
<pre> passwd </pre>
<pre> passwd </pre>


== 6. Enable Root SSH Access ==
== Enable Root SSH Access ==


: Edit the SSH configuration file:
: Edit the SSH configuration file:
Line 29: Line 29:
<pre> PermitRootLogin yes </pre>
<pre> PermitRootLogin yes </pre>


== 7. Restart SSH Service ==
== Restart SSH Service ==
<pre> systemctl restart sshd </pre>
<pre> systemctl restart sshd </pre>


== 8. Install Environment Modules ==
== Stop and disable OS firewall services ==
 
<pre>
systemctl stop ufw
systemctl disable ufw
</pre>
== Add IP Address and Hostname Mapping ==
 
On '''all nodes''', including containers and VM/bare-metal systems, add the IP address and hostname mapping entries into the `<code>/etc/hosts</code>` file.
 
Also, make sure to '''comment out''' the default hostname entry like this:
 
<pre>
#127.0.1.1    &lt;hostname&gt;
</pre>
 
=== Example ===
 
<pre>
#127.0.1.1    infra
192.168.2.5  infra.local      infra
192.168.2.3  node2.local      node2
192.168.2.4  node1.local      node1
192.168.2.6  slurm-login.local      slurm-login
192.168.2.7  slurm-db.local        slurm-dbsrv
192.168.2.8   slurm-master.local    slurm-master
192.168.2.9  slurm-ldap.local      slurm-ldapsrv
</pre>
 
 
== Install Environment Modules ==
<pre> apt install -y environment-modules </pre>
<pre> apt install -y environment-modules </pre>


== 9. Re-login for Modules to Work ==
== Re-login for Modules to Work ==


: After installation, logout and login again into each node to ensure module command works.
: After installation, logout and login again into each node to ensure module command works.


== 10. Configure Module Path ==
== Configure Module Path ==


: Edit the module path configuration file:
: Edit the module path configuration file:

Revision as of 11:28, 6 June 2025

Home > Ubuntu > Ubuntu HPC setup with slurm and linux containers > Common setup for all HPC nodes

Install Ubuntu 22.04

Install Ubuntu 22.04 Server on all nodes with most default values.
Create a non-root user (e.g., admin) during setup.

Login Using the Admin User

Login to each node using the admin user.

Configure History Retention

Enable storing date and time along with each command in history, as explained in the guide:

Storing date / time along with commands in history

Install Essential Packages

 sudo su - apt update apt -y install openssh-server vim htop stress munge 

Set Root Password

 passwd 

Enable Root SSH Access

Edit the SSH configuration file:
 vim /etc/ssh/sshd_config 
Locate and change the following line:
 PermitRootLogin yes 

Restart SSH Service

 systemctl restart sshd 

Stop and disable OS firewall services

systemctl stop ufw
systemctl disable ufw

Add IP Address and Hostname Mapping

On all nodes, including containers and VM/bare-metal systems, add the IP address and hostname mapping entries into the `/etc/hosts` file.

Also, make sure to comment out the default hostname entry like this:

#127.0.1.1    <hostname>

Example

#127.0.1.1    infra
192.168.2.5   infra.local       infra
192.168.2.3   node2.local       node2
192.168.2.4   node1.local       node1
192.168.2.6   slurm-login.local      slurm-login
192.168.2.7   slurm-db.local         slurm-dbsrv
192.168.2.8   slurm-master.local     slurm-master
192.168.2.9   slurm-ldap.local       slurm-ldapsrv


Install Environment Modules

 apt install -y environment-modules 

Re-login for Modules to Work

After installation, logout and login again into each node to ensure module command works.

Configure Module Path

Edit the module path configuration file:
 vim /etc/environment-modules/modulespath 
Add the following line:
 /export/modules 


Home > Ubuntu > Ubuntu HPC setup with slurm and linux containers > Common setup for all HPC nodes