Build VxRail 7.0

From Notes_Wiki

Home > VMWare platform > VxRail > Build VxRail 7.0


Pre-requisites for Building VxRail

  • VLANs as explained below in #Required VLANs
  • AD / DNS - Forward and reverse lookup. Note that reverse lookup is must.
  • NTP - It can be local or public (if Internet access would be there for ESXi, vCenter, VxRail manager
  • MTU - Ideally 9000+ MTU (eg 9216) at physical network switch level
  • Hostnames - We need to plan for hostnames for ESXi host, vCenter and VxRail manager. We need plan for corresponding IPs and add DNS entries (Both forward and reverse) before starting the build process.
  • Passwords - We need to plan for passwords with pattern similar to ABC@1234@ABCD@abcde12.
    • Note that if the password of vCenter user administrator@vsphere.local contains below special characters, Linux shell cannot handle these characters properly, will cause the script fail. Refer to KB 540052 for further details.
      Special characters list: (space) " # % * ' / = ? [ \ | & ; < > ( ) $ ^ ` ~
      Note: The ‘!’ character works correctly as the last character, but in any other position, it can cause scripts to fail.

Required VLANs

For building a VxRail cluster we need to have following VLANs:

VLAN 3939
This is used by VxRail nodes to communicate (discover) with each other. This is often referred as internal management VLAN. We should enable IGMP and IPv6 at least for this VLAN
VLAN for vSAN (MTU 9000+)
For vSAN connectivity
VLAN for vMotion (MTU 9000+)
For vMotion connectivity
VLAN for management
For giving IPs to vCenter, ESXi hosts, VxRail manager, etc. This is often referred as external management VLAN
VLAN for hardware management
Optionally you can decide to have iDRAC IPs in a different VLAN
One or more VLAN for VMs
These are VLANs are VMs. These can be created and specified later, after the build.


Network setup required

We should try to ensure MTU 9000+ (9198 or 9216): on all 10G trunk ports and for VLANs related to vSAN / vMotion.

To build the cluster we should trunk all the above VLANs to the ESXi hosts / VxRail nodes which would be used to build the cluster. We use only 10G or 25G ports in modern VxRail nodes. There is no point in connect 1G ports, even if present, except the iDRAC ports.

During trunking the management VLAN should be forwarded untagged (native). Rest all VLANs can be tagged. Hardware management VLAN is not required to be trunked. That is optional.

After the network configuration we should boot (or reboot) the VxRail nodes. The nodes automatically discover each other and VxRail manager VM (Present on all nodes by default) gets booted on one of the nodes. The default IP of this VM is 192.168.10.200. This is available by default in the management VLAN (Untagged / native VLAN) being forwarded to ESXi hosts.

After booting the nodes we should configure another machine in same VLAN (management VLAN) with additional (Secondary IP) of 192.168.10.X/24 where X is anything other than 200. Then we should be able to ping 192.168.10.200 from this node. Also we should be able to open the VxRail wizard http://192.168.10.200 from this node.


Build Wizard

During the Build following inputs are required:

  • Language :: English, Get started (Click)
  • License agreement :: Accept check box, Next (Click)
  • Cluster type :: VxRail cluster type: Standard cluster (3 or more nodes), Storage type: Standard vSAN
  • Resources :: Select appropriate ESXi hosts (3 to 6) based on service tag or iDRAC IP
  • Network confirmation :: Select both check boxes
  • Configuration method :: Step by step user input
  • Global settings :: Use
    • TLD :: <AD-or-org-domain-name>
      • vCenter :: VxRail provided
      • DNS :: External - IPs: <various DNS IPs separated by comman>
      • NTP :: Local or if Internet access is there we can use global such as pool.ntp.org, time.google.com, etc.
      • Syslog server :: <leave-empty>
  • VDS Settings :: Use
    • VDS configuration :: predefined
    • NIC configuration :: 2x10 GbE
      Use following to check NIC on each machine:
  esxcli network nic list
  • Unfortunately automation assumes NICs to be vmnic0, vmnic1, vmnic2, vmnic3. In case you are having any 1G ports in between such that vmnic0-1,4-5 are 10G and vmnic2-3 are 1g then we cant build with 4x10. We should build with 2x10g and we can add additional uplinks later.
    • All these NICs should have all VLANs in trunk except the management network which should be untagged/native. Ideally enable Jumbo frames (MTU 9216) on all switches
  • vCenter Settings :: Use
    • Automatically accept certificate :: Yes
    • vCenter hostname :: Appropriate hostname eg vcenter. This should resolve via AD to the IP given next before build is started.
    • IP :: <vcenter-IP>
    • Join an existing SSO domain :: No (Do yes for secondary DR servers appropriately)
      • In case of joining other domain we need to trust existing vCenter certificate, given its FQDN and credentials
    • Same password for all accounts :: Yes
    • vCenter Server Management Username :: admin
    • Passwords :: ABC@1234@ABCD@abcde12
      Careful about https://www.dell.com/support/kbdoc/en-us/000158231/vxrail-account-and-password-rules-in-vxrail
      Specially note special characters to avoid, Ideally have 16 character length to avoid re-work
      You can use the password template specified above and replace alphabets and numbers with other alphabets and numbers
  • Host settings :: Enter ESXi hostname, management username, management and root password, rack name, rack location, IP address etc. for all ESXi hosts
    All these hosts must have identical hard-disk configuration (SSD, cache, capacity) for build to work. The hard-disks should also be in correct slots. That is if first server has its cache disk in slot-1, then second server should have identical capacity, make and model disk in slot-1. In case of same number and type of disks, with differing slots also build (validation) fails.
    The hosts should also have matching version of iDRAC, BIOS, NIC firmware, etc. Any small difference in versions will also cause validation to fail and build / host addition to fail.
    • Example rack names :: dcrack01, dcrack02 and drrack01
    • Example Position :: Used 1 for lowest server, 2 for server above it and so on
    • ESXi management username :: admin
    • ESXi root password :: ABC@1234@ABCD@abcde34
    • ESXi admin password :: ABC@1234@ABCD@abcde56
  • VxRail manager settings :: Use
    • Hostname :: vxrail (or other appropriate hostname). This should resolve the IP specified next via DNS before starting build.
    • IP :: <VxRail-manager-IP>
    • Root password :: ABC@1234@ABCD@abcde78
    • mystic password :: ABC@1234@ABCD@abcde90
      This cannot be same as manager root password
  • Virtual network settings :: Use
    • Management subnet mask :: 255.255.255.0 (or other appropriate netmask)
    • GW :: Gateway for management VLAN.
    • VLAN ID :: 0 (We are sending management VLAN untagged / native)
    • port binding :: Ephemeral Binding
    • vSAN :: Use autofill with: (In case of autofill it is not necessary that first node will get first IP and so on. If you are particular about assigning lowest IP to first node and then incremental IPs to consecutive nodes, then use manual fill option.)
    • vSAN Starting IP :: Starting IP to use in vSAN VLAN
    • vSAN ending IP :: Automatically taken based on no. of hosts and starting IP
    • vSAN subnet mask :: 255.255.255.0 (or other appropriate Mask)
    • vSAN VLAN :: vSAN VLAN ID. This should be trunked to all ESXi hosts 10G or 25G ports.
    • vMotion :: Use autofill with (In case of autofill it is not necessary that first node will get first IP and so on. If you are particular about assigning lowest IP to first node and then incremental IPs to consecutive nodes, then use manual fill option.)
    • vMotion Starting IP :: Starting IP to use in vMotion VLAN
    • vMotion ending IP :: Automatically taken based on no. of hosts and starting IP
    • vMotion subnet mask :: 255.255.255.0 (or other appropriate Mask)
    • vMotion VLAN :: vMotion VLAN ID. This should be trunked to all ESXi hosts 10G or 25G ports.
    • Guest networks :: We can create later
    • vCenter port binding :: Ephemeral Binding

After this validate and build the cluster. After cluster is built we can login at https://<vcenter-fqdn> Note that there is no separate VxRail UI. VxRail related options will be visible in vCenter at cluster level under Configure -> VxRail.


Adding additional nodes to cluster

To add additional nodes use:

  1. Login into vCenter
  2. Go to cluster -> Configure -> VxRail -> hosts
  3. Click Add
  4. The node should appear automatically. Select node and proceed.
  5. Enter vcenter authentication details and proceed
  6. Select NIC configuration similar to other existing hosts
  7. Enter hostname, IP address, esxi management username, management and root passwords
  8. Enter rack name and position (host location)
  9. Enter vSAN and vMotion IP addresses in same subnet as other hosts
  10. Validate configuration
  11. Add node


Configuration changes after initial build

Default vSAN FTT to FTT=1, RAID 5 Erasure coding
Change policy from RAID-1, Mirroring to RAID-5 Erasure coding. This assumes at least 4 hosts are present and that cluster is all-flash cluster.
Add more NIC to hosts, MTU 9000
If due to non-contiguous NIC numbering such as vmnic0, vmnic1, vmnic4, vmnic5, the VxRail has been build with 2x10G option. We should add the other two NIC later on by
  • Go to vCenter -> Networking page. Right click on distributed switch go to Settings -> Edit settings. On uplinks tab, add two more uplinks
  • On Advanced Tab increase MTU of switch to 9000. This assumes physical network has already been configured to support MTU 9000+.
  • Right click on distributed switch. Choose "Add or manage hosts". Select "Manage host networking". Click on "+Attached hosts" and select all hosts. On Manage physical adapter page, for the correct vmnic (one which is up, connected to switch), click on Assign uplink and check "Apply this uplink configuration on rest of the hosts". Do next, next, next, finish to complete the wizard.
  • Right click on each distributed port-group and ensure that under teaming and failover
    • Network failure detection is set to beacon probing
    • All uplinks are active (Assumes load balancing is set to "Route based on originating virtual port" or "Route based on NIC load" (preferable)
  • Also increase vMkernel (vSAN, vMotion) MTU to 9000. This assumes that distributed switch and underlying network MTU has already been increased to 9000+. This has to be done on each ESXi host manually. After this vSAN large MTU ping test should pass.
    • Go to Cluster -> Configure -> vSAN -> Skyline health -> Ensure that network ping with large MTU test passes
    • Go to Cluster -> Configure -> vSAN -> Proactive tests. Network performance test should pass.
Enable performance service
Go to configure -> vSAN -> Services and Enable performance service.



Troubleshooting VxRail build or add node issues

Password complexity

This is absolutely critical as it leads to considerable time waste in again Factory reset of all nodes and is also not obvious / easy to troubleshoot

If password complexity is not correct then VxRail accepts passwords during Wizards and then fails during build process with errors such as:

An internal error occurred.  Failed to add exception accounts for hosts

Failed to create vCenter management account vcentermgmt.  Please pick a password that is in compliance with vCenter password policy and try again.

For proper password complexity rules Refer https://www.dell.com/support/kbdoc/en-us/000158231/vxrail-account-and-password-rules-in-vxrail

To save time use the template mentioned in above article after replacing ABCD1234 etc with another characters or numbers. Do not introduce any new special characters. Do not reduce the length by too much.


Default ESXi root credentials for new VxRail nodes

Default account for VxRail ESXi root account

Default ESXi account
root:Passw0rd!

Refer:


Nodes not getting detected

Node detection depends upon VLAN 3939, management VLAN untagged being forward to all ESXi hosts properly and also on IPv6. To diagnose further use:

On VxRail manager below command should list appliance ID of all nodes discovered so far:

/usr/lib/vmware-loudmouth/bin/loudmouthc query | grep -o applianceID=[A-Z0-9]*

Login into VxRail manager using mystic user, if deployment is not done yet we can login with root:Passw0rd! using default IP 192.168.10.200

On ESXi host:

  • Check the vmkernel port (Eg vmk2) for port group private management network using:
    esxcli network ip interface list| less
  • Check Ipv6 address of management network vmKernel port
    esxcli network ip interface ipv6 address list
  • Ping from ESXi host to VxRail manager
    ping6 -I vmk0 <VxRail manager VLAN 3939 MAC ipv6-ip>

Also you can ping from VxRail manager to ESXi host IPv6 IP for private management network port-group (3939 VLAN)


  • If network wise ping is working then use the following on ESXi host which is not getting discovered in manager:
/etc/init.d/loudmouth restart
/etc/init.d/loudmouth status 

/etc/init.d/vxrail-pservice restart
/etc/init.d/vxrail-pservice status
  • On network switches we can check whether IGMP is enabled on VLAN 3939 or not using:
show ipv6 mld snooping interface vlan 3939


References


Disk related and disk position related errors

If you get error such as:

   Cache slots 21 has no disk but with capacity disks followed on host GYZTSK3.\

removing and reinserting disk on slot 21 might solve the problem. Note that as specified earlier for build to work all nodes must have identical make, model and capacity of disks in each server. They should also be in same order / same server slot. If there are additional disks also in any server, build wont work until we remove extra disks.

If disks are removed and changed. There might be iDRAC level warning / amber light on server. To solve that in iDRAC go to Maintenance -> Diagnostics and choose options to "Restart iDRAC" or "Reset iDRAC" appropriately.


Change default internal management VLAN

Default internal management VLAN is hard-coded as 3939 on VxRail. However, if we need to change it for some reason before building the cluster we can use:

  1. See VLAN of Management and Private VM vlans using:
    esxcli network vswitch standard portgroup list
  2. Create portgroups if they dont exist already using:
    esxcli network vswitch standard portgroup add --portrgroup-name=Private Management Network --vswitch-name=vSwitch0
    esxcli network vswitch standard portgroup add --portrgroup-name=Private VM Network --vswitch-name=vSwitch0
  3. If they exist change VLAN ID to 0 (untagged) or appropriate other VLAN such as 939 etc. using:
    esxcli network vswitch standard portgroup set -p “Private Management Network” -v 0
    esxcli network vswitch standard portgroup set -p “Private VM Network” –v 0
  4. On at least one of the four nodes the VxRail manager VM should be on. Check existing power state using:
    vim-cmd vmsvc/getallvms
    vim-cmd vmsvc/power.getstate 1 | grep -i power
  5. If required power on VM using:
    vim-cmd vmsvc/power.on 1


Home > VMWare platform > VxRail > Build VxRail 7.0