CentOS 8.x Glusterfs basic setup of distributed volume

From Notes_Wiki

Home > CentOS > CentOS 8.x > CentOS 8.x Distributed Filesystems > CentOS 8.x Glusterfs > CentOS 8.x Glusterfs basic setup of distributed volume

Terminology and concepts

Glusterfs can be used to combine local storage of various individual servers into a shared multi-write distributed filesystem. This way multiple clients can access this distributed storage for simultaneous read/write. These distributed volumes are useful for any kind of virtualization or big data applications.

In case you have central storage. You can mount smaller disks (eg 1TB) with each of the participating servers. Then each server can provide access to this 1TB disk to entire cluster. This way whether you are using FC / iSCSI, the storage access is distributed across multiple nodes providing faster access.

Gluster term for individual disks that are being used for storage is 'brick'. Hence each node in gluster setup can contribute one or more disks (Partitions / Filesystems) to overall storage. These bricks contain local (non-distributed) filesystem such as ext3 or xfs. Gluster combines storage of these bricks to create volume. Volumes are then mounted on clients to access distributed storage.

Gluster does not requires any OS level clustering the way it is required for OCSFS2 / GFS2. It also does not requires any metadata server similar to Moosefs. Hence the data is distributed among nodes based on hashing.

While creating volume there are different options:

Distributed volume
Data is distributed among bricks. Hence if two files are create file1 and file2, it is possible that file1 will get stored on brick1 and file2 on brick2.
Replication
A same file can be stored on at least two different bricks with replication set to 2.
Striping
A single file can be split into smaller files and these can be distributed among bricks for faster access to a single large file

There are also options related to allowing only certain nodes to mount the volume based on IP or password. There are options to set quota in terms of directory space usage. There are also options to set limit on how much storage gluster can use from each brick. For example for 1TB brick we can set limit to 900GB so that gluster does not uses more than 900GB space from that brick even when 100GB space is left available.

Refer multiple pages of documentation at https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Quick-Start-Guide/Architecture/ for more information


Server and distributed volume setup steps

To setup glusterfs among multiple machines use following steps:

  1. Disable firewalld and selinux
    setenforce 0
    #vim /etc/sysconfig/selinux
    systemctl stop firewalld
    systemctl disable firewalld
    You can always look into ports used by gluster version being deployed and only open those ports in firewall. Refer https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/installation_guide/port_information or https://www.jamescoyle.net/how-to/457-gluste4frfs-firewall-rules
    However port based allow/deny may not be good enough as any malicious node can try to peer and become member of trusted pool. Hence the required communication should be allowed only from trusted IPs instead of restricting by port CentOS 8.x firewalld rich rules
    If you restrict based on IPs at least allow ports 24007 and 49152- onwards from clients that may access the gluster volumes. Do not give access to ports 49150 and 49151 from clients.
  2. Configure NTP with CentOS 8.x chronyc ntp client configuration
  3. If multiple NICs are available it makes sense to use bonding to get higher throughput / availability CentOS 7.x network bonding
  4. Install glusterfs server side packages using:
    1. Install glusterfs repo
      dnf install -y centos-release-gluster
    2. Enable power tools repo
      dnf config-manager --set-enabled powertools
    3. Install glusterfs
      dnf -y install glusterfs-server
    4. Enable glusterfs service
      systemctl enable glusterd
      systemctl start glusterd
      systemctl status glusterd
  5. Create desired filesystem on drives / partitions that will be used to store glusterfs data. If there are 3 servers participating (Provide disk space) in glusterfs setup, you need to follow above steps on all three servers, including creation of appropriate drive / partition. Each of these partition should be mounted locally on appropriate path. Ideally the final brick location should be sub-folder inside parent mounted folder. This way if drive is not mounted gluster will fail to start:
    Refer https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Brick%20Naming%20Conventions/
  6. If you want to use glusterfs with FQDN/names instead of IPs, appropriate resolution of these names from all servers and glusterfs clients is required for the gluster volume to function proprely.
  7. Add glusterfs nodes to trusted storage pool
    1. Start cluster from first host trying to add second host to trusted pool. Syntax is
      gluster peer probe <peer-ip-or-hostname>
    2. Check the connected peers on individual cluster nodes
      gluster peer status
    3. List the hosts in the Gluster Cluster
      gluster pool list
  8. Setup and start Glusterfs Distributed volume
    1. Create glusterfs distributed volume
      gluster volume create <vol-name> transport tcp <host1-ip-or-fqdn>:<host1-local-brick-mount-point> <host2-ip-or-fqdn>:<host2-local-brick-mount-point>
      Example
      gluster volume create distributed_volume transport tcp glusterfs1:/mnt/brick1/dist_vol glusterfs2:/mnt/brick2/dist_vol
    2. Start the created volume
      gluster volume start <vol-name>
    3. More info on the volume
      gluster volume info <vol-name>
    4. Check the status of the glusterfs distributed volume
      gluster volume status <vol-name>
  9. Ideally restrict volume to be accessible only by appropriate clients
    gluster volume set <vol-name> auth.allow <ips-to-be-allowed-to-access-cluster>
    Refer:


Access glusterfs volume from client machine

To access the created glustefs volume from client machine use:

  1. Install glusterfs fuse
    dnf -y install glusterfs-fuse
  2. Mount the GlusterFS Distributed Volume
    mount -t glusterfs <any-gluster-host-or-ip>:/<vol-name> <mount-point>
    If you are mounting on a host which is also participating in trusted storage pool, then the host-ip can be local ip. This way the storage is accessed via local machine networking for higher performance and avoid other machine as potential point of failure
  3. To mount via /etc/fstab example syntax is:
    server1:/test-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0
    Refer https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Setting%20Up%20Clients/


Refer:


Home > CentOS > CentOS 8.x > CentOS 8.x Distributed Filesystems > CentOS 8.x Glusterfs > CentOS 8.x Glusterfs basic setup of distributed volume