Build VxRail 7.0
For building a VxRail cluster we need to have following VLANs:
- VLAN 3939
- This is used by VxRail nodes to communicate (discover) with each other. This is often referred as internal management VLAN. We should enable IGMP and IPv6 at least for this VLAN
- VLAN for vSAN
- For vSAN connectivity
- VLAN for vMotion
- For vMotion connectivity
- VLAN for management
- For giving IPs to vCenter, ESXi hosts, VxRail manager, etc. This is often referred as external management VLAN
- VLAN for hardware management
- Optionally you can decide to have iDRAC IPs in a different VLAN
- One or more VLAN for VMs
- These are VLANs are VMs. These can be created and specified later, after the build.
Network setup required
To build the cluster we should trunk all the above VLANs to the ESXi hosts / VxRail nodes which would be used to build the cluster. We use only 10G or 25G ports in modern VxRail nodes. There is no point in connect 1G ports, even if present, except the iDRAC ports.
During trunking the management VLAN should be forwarded untagged (native). Rest all VLANs can be tagged. Hardware management VLAN is not required to be trunked. That is optional.
After the network configuration we should boot (or reboot) the VxRail nodes. The nodes automatically discover each other and VxRail manager VM (Present on all nodes by default) gets booted on one of the nodes. The default IP of this VM is 192.168.10.200. This is available by default in the management VLAN (Untagged / native VLAN) being forwarded to ESXi hosts.
After booting the nodes we should configure another machine in same VLAN (management VLAN) with additional (Secondary IP) of 192.168.10.X/24 where X is anything other than 200. Then we should be able to ping 192.168.10.200 from this node. Also we should be able to open the VxRail wizard http://192.168.10.200 from this node.
During the Build following inputs are required:
- Language :: English, Get started (Click)
- License agreement :: Accept check box, Next (Click)
- Cluster type :: VxRail cluster type: Standard cluster (3 or more nodes), Storage type: Standard vSAN
- Resources :: Select appropriate ESXi hosts (3 to 6) based on service tag or iDRAC IP
- Network confirmation :: Select both check boxes
- Configuration method :: Step by step user input
- Global settings :: Use
- TLD :: <AD-or-org-domain-name>
- vCenter :: VxRail provided
- DNS :: External - IPs: <various DNS IPs separated by comman>
- NTP :: Local or if Internet access is there we can use global such as pool.ntp.org, time.google.com, etc.
- Syslog server :: <leave-empty>
- TLD :: <AD-or-org-domain-name>
- VDS Settings :: Use
- VDS configuration :: predefined
- NIC configuration :: 2x10 GbE
- Use following to check NIC on each machine:
esxcli network nic list
- Unfortunately automation assumes NICs to be vmnic0, vmnic1, vmnic2, vmnic3. In case you are having any 1G ports in between such that vmnic0-1,4-5 are 10G and vmnic2-3 are 1g then we cant build with 4x10. We should build with 2x10g and we can add additional uplinks later.
- All these NICs should have all VLANs in trunk except the management network which should be untagged/native. Ideally enable Jumbo frames (MTU 9216) on all switches
- vCenter Settings :: Use
- Automatically accept certificate :: Yes
- vCenter hostname :: Appropriate hostname eg vcenter. This should resolve via AD to the IP given next before build is started.
- IP :: <vcenter-IP>
- Join an existing SSO domain :: No (Do yes for secondary DR servers appropriately)
- In case of joining other domain we need to trust existing vCenter certificate, given its FQDN and credentials
- Same password for all accounts :: Yes
- vCenter Server Management Username :: admin
- Passwords :: ABC@1234@ABCD@abcde12
- Careful about https://www.dell.com/support/kbdoc/en-us/000158231/vxrail-account-and-password-rules-in-vxrail
- Specially note special characters to avoid, Ideally have 16 character length to avoid re-work
- You can use the password template specified above and replace alphabets and numbers with other alphabets and numbers
- Host settings :: Enter ESXi hostname, management username, management and root password, rack name, rack location, IP address etc. for all ESXi hosts
- All these hosts must have identical hard-disk configuration (SSD, cache, capacity) for build to work. The hard-disks should also be in correct slots. That is if first server has its cache disk in slot-1, then second server should have identical capacity, make and model disk in slot-1. In case of same number and type of disks, with differing slots also build (validation) fails.
- The hosts should also have matching version of iDRAC, BIOS, NIC firmware, etc. Any small difference in versions will also cause validation to fail and build / host addition to fail.
- Example rack names :: dcrack01, dcrack02 and drrack01
- Example Position :: Used 1 for lowest server, 2 for server above it and so on
- ESXi management username :: admin
- ESXi root password :: ABC@1234@ABCD@abcde34
- ESXi admin password :: ABC@1234@ABCD@abcde56
- VxRail manager settings :: Use
- Hostname :: vxrail (or other appropriate hostname). This should resolve the IP specified next via DNS before starting build.
- IP :: <VxRail-manager-IP>
- Root password :: ABC@1234@ABCD@abcde78
- mystic password :: ABC@1234@ABCD@abcde90
- This cannot be same as manager root password
- Virtual network settings :: Use
- Management subnet mask :: 255.255.255.0 (or other appropriate netmask)
- GW :: Gateway for management VLAN.
- VLAN ID :: 0 (We are sending management VLAN untagged / native)
- port binding :: Ephemeral Binding
- vSAN :: Use autofill with: (In case of autofill it is not necessary that first node will get first IP and so on. If you are particular about assigning lowest IP to first node and then incremental IPs to consecutive nodes, then use manual fill option.)
- vSAN Starting IP :: Starting IP to use in vSAN VLAN
- vSAN ending IP :: Automatically taken based on no. of hosts and starting IP
- vSAN subnet mask :: 255.255.255.0 (or other appropriate Mask)
- vSAN VLAN :: vSAN VLAN ID. This should be trunked to all ESXi hosts 10G or 25G ports.
- vMotion :: Use autofill with (In case of autofill it is not necessary that first node will get first IP and so on. If you are particular about assigning lowest IP to first node and then incremental IPs to consecutive nodes, then use manual fill option.)
- vMotion Starting IP :: Starting IP to use in vMotion VLAN
- vMotion ending IP :: Automatically taken based on no. of hosts and starting IP
- vMotion subnet mask :: 255.255.255.0 (or other appropriate Mask)
- vMotion VLAN :: vMotion VLAN ID. This should be trunked to all ESXi hosts 10G or 25G ports.
- Guest networks :: We can create later
- vCenter port binding :: Ephemeral Binding
After this validate and build the cluster. After cluster is built we can login at https://<vcenter-fqdn> Note that there is no separate VxRail UI. VxRail related options will be visible in vCenter at cluster level under Configure -> VxRail.
Adding additional nodes to cluster
To add additional nodes use:
- Login into vCenter
- Go to cluster -> Configure -> VxRail -> hosts
- Click Add
- The node should appear automatically. Select node and proceed.
- Enter vcenter authentication details and proceed
- Select NIC configuration similar to other existing hosts
- Enter hostname, IP address, esxi management username, management and root passwords
- Enter rack name and position (host location)
- Enter vSAN and vMotion IP addresses in same subnet as other hosts
- Validate configuration
- Add node
Configuration changes after initial build
- Default vSAN FTT to FTT=1, RAID 5 Erasure coding
- Change policy from RAID-1, Mirroring to RAID-5 Erasure coding. This assumes at least 4 hosts are present and that cluster is all-flash cluster.
- Add more NIC to hosts, MTU 9000
- If due to non-contiguous NIC numbering such as vmnic0, vmnic1, vmnic4, vmnic5, the VxRail has been build with 2x10G option. We should add the other two NIC later on by
- Go to vCenter -> Networking page. Right click on distributed switch go to Settings -> Edit settings. On uplinks tab, add two more uplinks
- On Advanced Tab increase MTU of switch to 9000. This assumes physical network has already been configured to support MTU 9000+.
- Right click on distributed switch. Choose "Add or manage hosts". Select "Manage host networking". Click on "+Attached hosts" and select all hosts. On Manage physical adapter page, for the correct vmnic (one which is up, connected to switch), click on Assign uplink and check "Apply this uplink configuration on rest of the hosts". Do next, next, next, finish to complete the wizard.
- Right click on each distributed port-group and ensure that under teaming and failover
- Network failure detection is set to beacon probing
- All uplinks are active (Assumes load balancing is set to "Route based on originating virtual port" or "Route based on NIC load" (preferable)
- Also increase vMkernel (vSAN, vMotion) MTU to 9000. This assumes that distributed switch and underlying network MTU has already been increased to 9000+. This has to be done on each ESXi host manually. After this vSAN large MTU ping test should pass.
- Go to Cluster -> Configure -> vSAN -> Skyline health -> Ensure that network ping with large MTU test passes
- Go to Cluster -> Configure -> vSAN -> Proactive tests. Network performance test should pass.
- Enable performance service
- Go to configure -> vSAN -> Services and Enable performance service.
Troubleshooting VxRail build or add node issues
This is absolutely critical as it leads to considerable time waste in again Factory reset of all nodes and is also not obvious / easy to troubleshoot
If password complexity is not correct then VxRail accepts passwords during Wizards and then fails during build process with errors such as:
An internal error occurred. Failed to add exception accounts for hosts Failed to create vCenter management account vcentermgmt. Please pick a password that is in compliance with vCenter password policy and try again.
For proper password complexity rules Refer https://www.dell.com/support/kbdoc/en-us/000158231/vxrail-account-and-password-rules-in-vxrail
To save time use the template mentioned in above article after replacing ABCD1234 etc with another characters or numbers. Do not introduce any new special characters. Do not reduce the length by too much.
Default ESXi root credentials for new VxRail nodes
Default account for VxRail ESXi root account
- Default ESXi account
Nodes not getting detected
Node detection depends upon VLAN 3939, management VLAN untagged being forward to all ESXi hosts properly and also on IPv6. To diagnose further use:
On VxRail manager below command should list appliance ID of all nodes discovered so far:
/usr/lib/vmware-loudmouth/bin/loudmouthc query | grep -o applianceID=[A-Z0-9]*
Login into VxRail manager using mystic user, if deployment is not done yet we can login with root:Passw0rd! using default IP 192.168.10.200
On ESXi host:
- Check the vmkernel port (Eg vmk2) for port group private management network using:
- esxcli network ip interface list| less
- Check Ipv6 address of management network vmKernel port
- esxcli network ip interface ipv6 address list
- Ping from ESXi host to VxRail manager
- ping6 -I vmk0 <VxRail manager VLAN 3939 MAC ipv6-ip>
Also you can ping from VxRail manager to ESXi host IPv6 IP for private management network port-group (3939 VLAN)
- If network wise ping is working then use the following on ESXi host which is not getting discovered in manager:
/etc/init.d/loudmouth restart /etc/init.d/loudmouth status /etc/init.d/vxrail-pservice restart /etc/init.d/vxrail-pservice status
- On network switches we can check whether IGMP is enabled on VLAN 3939 or not using:
show ipv6 mld snooping interface vlan 3939
If you get error such as:
Cache slots 21 has no disk but with capacity disks followed on host GYZTSK3.\
removing and reinserting disk on slot 21 might solve the problem. Note that as specified earlier for build to work all nodes must have identical make, model and capacity of disks in each server. They should also be in same order / same server slot. If there are additional disks also in any server, build wont work until we remove extra disks.
If disks are removed and changed. There might be iDRAC level warning / amber light on server. To solve that in iDRAC go to Maintenance -> Diagnostics and choose options to "Restart iDRAC" or "Reset iDRAC" appropriately.
Change default internal management VLAN
Default internal management VLAN is hard-coded as 3939 on VxRail. However, if we need to change it for some reason before building the cluster we can use:
- See VLAN of Management and Private VM vlans using:
- esxcli network vswitch standard portgroup list
- Create portgroups if they dont exist already using:
- esxcli network vswitch standard portgroup add --portrgroup-name=Private Management Network --vswitch-name=vSwitch0
- esxcli network vswitch standard portgroup add --portrgroup-name=Private VM Network --vswitch-name=vSwitch0
- If they exist change VLAN ID to 0 (untagged) or appropriate other VLAN such as 939 etc. using:
- esxcli network vswitch standard portgroup set -p “Private Management Network” -v 0
- esxcli network vswitch standard portgroup set -p “Private VM Network” –v 0
- On at least one of the four nodes the VxRail manager VM should be on. Check existing power state using:
- vim-cmd vmsvc/getallvms
- vim-cmd vmsvc/power.getstate 1 | grep -i power
- If required power on VM using:
- vim-cmd vmsvc/power.on 1