Setup manual lab environment for vCloud Foundation using nested ESXi hosts

From Notes_Wiki

Home > VMWare platform > vCloud Foundation > Setup manual lab environment for vCloud Foundation using nested ESXi hosts

It is possible to create nested four ESXi hosts and then use them to deploy vCloud Foundation. To do this easily we can refer to Setup automated lab environment for vCloud Foundation using VLC. But to have more control on environment and to understand things better we can try the same without using VLC. To do that use following steps

  1. First build nested ESXi template of desired ESXi version as explained in particular vCF version release notes. For nested setup see Install nested ESXi on top of ESXi in a VM
    Perhaps create host with 40GB disk for ESXi installaation, 10GB disk for cache and 3x200GB disks for capacity
    If there is no direct ISO for the mentioned version then download the lower version available. Then update ESXi host using Install ESXi patch via depot zip file
    Base nodes should have CPU capable of running ESXi required for particular vCF version. For example if the underlying physical CPU can only support till ESXi 6.5U3 and then we cannot deploy ESXi 7.0 in nested VM on top of same CPU.
  2. Create NSX port-group for all VLAN trunk using Create standard port-group for trunking all VLANs to VM. Each nested ESXi VM should have 4 vmxnet3 uplinks connecting to ALL VLAN trunk port-group
  3. On both switch and at port-group level in security settings allow Promiscious mode, MAC address change and forge transmits by setting all three to Accept.
  4. Change the MTU of vSwitch (Standard or DVS) to at least 9000
  5. To build nested VSAN node refer Create nested ESXi for vSAN experiments with SSD disk
    Basically download the VM .vmx file. Edit it to have 'scsi0:1.virtualSSD = "1"' for 10GB and 3x200GB disks. The 40GB disk can be left as it is without making it SSD.
  6. To build template of ESXi host so that it can be cloned to get 4 identical ESXi hosts refer Create template of nested ESXi for cloning and repeat use
  7. Build networks and parameters as explained in NSX-T article Configure NSX-T 3.0 from scratch with edge cluster and tier gateways
    • Compared to NSX article linked here note that vCF does not allows ToR and VMWare T0 gateway on edge to have same BGP AS number. Hence change AS number on ToR to 65001 while keeping rest settings same to deploy vCF
    • Since NSX article was not created with vSAN network in mind, re-use DHCP (host-VTEP network) for vSAN also. Choose IPs near the end such as 10.1.3.210 to 10.1.3.240 for vSAN as anyway DHCP will not release as many IPs for host-VTEP.
  8. For nested VMs created using templates there would be issue with SSL Certificates. That can be resolving using:
    1. Set correct hostname with .domain-suffix
    2. Do not have any custom suffix (It should be not set)
    3. Run the /sbin/generate-certificate command from ESXi shell (SSH)
    4. Reboot host (Restarting services as explained in below article is not enough)
    Refer: https://my-cloudy-world.com/2021/02/10/cloud-builder-validation-ssl-certificate-common-name-doesnt-match-esxi-fqdn/
  9. Prepare ESXi hosts for vCF using:
    1. Configure Management IP (10.1.1.11-14), Managment VLAN (201), DNS (10.1.1.2), DNS suffix as vcfrnd.com - https://docs.vmware.com/en/VMware-Cloud-Foundation/4.2/com.vmware.vcf.ovdeploy.doc_42/GUID-E22AA0E2-98F3-479F-A983-02D83D2FBF17.html
    2. VM network VLAN ID same as management (201) - https://docs.vmware.com/en/VMware-Cloud-Foundation/4.2/com.vmware.vcf.ovdeploy.doc_42/GUID-1FCA9839-02D7-437C-8666-04DD6F022083.html
    3. Start and enable ssh and ntp services - https://docs.vmware.com/en/VMware-Cloud-Foundation/4.2/com.vmware.vcf.ovdeploy.doc_42/GUID-9AFB9772-8B23-4015-A751-ED04B7CC305D.html
  10. Build a jump host with IP 10.1.1.2 for accessing cloud builder and for providing NTP and DNS services. Example configuration files are mentioned at end of this article assuming domain name to be vcfrnd.com
    1. Configure NTP as explained with #Example_ntp_configuration_file below
    2. Configure DNS as explained with #Example_DNS_configuration_files including reverse entries below
  11. Fill the deployment sheet. Add appropriate DNS entries. Configure BGP values appropriately.
    Example deployment sheet is available at 2021-04-16-vcf-manual-deployment.xlsx
  12. Deploy cloud builder appliance with IP 10.1.1.31 - VLAN 201 - hostname cb - Domain name vcfrnd.com
  13. Open cloud builder interface in private browser window ( https://10.1.1.31/ ). Upload the filled sheet and run validation
    If after uploading sheet you get "Internal server error". Best option is to delete cloud builder VM and deploy a new one.
  14. If you are doing this for lab setup and have limited resources, considering building json from xlsx file and removing two network managers to save resources using:
    1. scp xlsx from jump box to /home/admin in cloud builder
    2. ssh as admin to cloud builder
    3. sudo su -
    4. cd /opt/vmware/sddc-support
    5. ./sos --jsongenerator --jsongenerator-input /home/admin/poclab-ems-deployment-parameter.xlsx --jsongenerator-design vcf-public-ems
    6. cd /opt/vmware/sddc-support/cloud_admin_tools/Resources/vcf-public-ems
    7. cp vcf-public-ems.json /home/admin
    8. chown admin:vcf /home/admin/vcf-public-ems.json
    9. scp json from /home/admin to jump-box
    10. In jump box edit json and remove network manager specs for 10.1.1.6 (nsx01b) and 10.1.1.7 (nsx01c). Same can be seen in example json at Vcf-public-ems.json
    Refer https://greatwhitetec.com/2020/08/05/vcf-lab-tips-nsx-cluster-size/ and
    https://greatwhitetec.com/2020/07/28/vcf-generate-json-file-from-excel-spreadsheet/
  15. Ideally once validate is successful. Shutdown all four nested ESXi hosts and take a snapshot. This helps in getting back to validation succeed stage quickly.
  16. Perform build up
    Note that build up may fail multiple times. It should be possible to restart cloud builder or VMs (vCenter, etc.) running inside nested ESXi host to solve problem. If all fails you can revert to snapshot when validation succeeded and try again.
    The lab deployment works when done using a 2TB SSD USB as datastore Configure VMFS filesystem on USB disk and mount it as datastore on ESXi host with host with 256GB RAM where each nested ESXi host is created with 8 cores and 48GB RAM
    You can monitor various logs at /opt/vmware/bringup/logs with tail -f * in cloud builder appliance and see what is going on. Any specific log file can be opened in another terminal
    See #Bring_up_failed_due_to_authentication_failures_from_Cloud_builder_to_vCF_SDDC_appliance
  17. Change "vSAN Default Storage Policy" to have FTT=0, Disable object checksum, Enable force provisioning. This should ensure that data is written only once and not twice.
  18. You can now try to deploy vRealize Suite components by referring Deploy vRealize LifeCycle Manager on vCloud Foundation


Note that in nested environments with NSX ping responses might appear to be duplicate. This only happens in nested environment where the nested VMs are on same physical host. See: https://communities.vmware.com/t5/VMware-NSX-Discussions/Duplicated-Ping-responses-in-NSX/td-p/969774


Bring up failed due to authentication failures from Cloud builder to vCF SDDC appliance

In case of Retry bringup you can possibly see following log lines in file '/opt/vmware/bringup/logs/vcf-bringup.2021-04-21.0.log'

com.vmware.evo.sddc.orchestrator.exceptions.OrchTaskException: Rollback failed for configure base install repo task
...
caused by: com.vmware.vcf.secure.ssh.errors.VcfSshException: Failed to establish SSH session to gbb-vcf01.vcfrnd.com

After this even local login on nested vCF SDDC VM via nested vCenter console fails. To solve this:

  1. Reboot the vcf appliance
  2. After reboot login into vCF appliance console via nested vCenter / ESXi host. You should enter corrrect password in first attempt.
  3. Then edit '/etc/pam.d/system-auth' to change deny, root_unlock_time and unlock_time values:
    auth required pam_tally2.so file=/var/log/tallylog deny=30 onerr=fail even_deny_root unlock_time=4 root_unlock_time=3
  4. Edit '/etc/ssh/sshd_config' to allow more login attempts:
    MaxAuthTries 200
    PermitRootLogin yes
  5. Restart ssh server
    systemctl restart sshd
  6. After this generated ssh-key pair on cb 'ssh-keygen root account and copied the keys as authorized to root of vcf 'ssh-copy-id root@10.1.1.8'.
  7. Restarted the bring-up process. Found following in logs this time in file /opt/vmware/bringup/logs/vcf-bringup.log
    2021-04-22T06:12:07.411+0000 [bringup,558d56060e59839d,9e4f] INFO [c.vmware.vcf.secure.ssh.SshExecuter,bringup-exec-8] Creating directory /nfs/vmware/vcf/nfs-mount/base-install-images/nsxt_ova on host: gbb-vcf01.vcfrnd.com
    2021-04-22T06:12:07.411+0000 [bringup,558d56060e59839d,0d58] INFO [c.vmware.vcf.secure.ssh.SshExecuter,bringup-exec-7] Creating directory /nfs/vmware/vcf/nfs-mount/base-install-images/vcenter_ova on host: gbb-vcf01.vcfrnd.com
  8. Note that there is also
    pam_tally2 -u root
    pam_tally2 -u root -r
    but this does not helps as no. of authentication attempts is too small. That count ends up getting used by ssh-keys itself.

Refer:



Example ntp configuration file

See CentOS 8.x chronyc ntp client configuration for chrony configuration on CentOS 8. Along with the steps use below options in configuration file:

server time.google.com

allow 192.168.0.0/16
allow 10.0.0.0/8
allow 172.16.0.0/12

local stratum 10


Example DNS configuration files

See Configuring basic DNS service with bind for DNS server setup. Then have following additional lines/files:

named.conf

After zone "." definition insert:

zone "vcfrnd.com." IN {
        type master;
        file "vcfrnd.com.forward";
};

zone "1.10.in-addr.arpa." {
	type master;
	file "10.1.reverse.db";
};


/var/named/vcfrnd.com.forward

$TTL 3600
@ SOA ns.vcfrnd.com. root.vcfrnd.com. (1 15m 5m 30d 1h)
	NS ns.vcfrnd.com.
	A 10.1.1.2
ns              IN      A       10.1.1.2
ntp		IN	A	10.1.1.2
gbb01-m01-esx01	IN	A	10.1.1.11
gbb01-m01-esx02	IN	A	10.1.1.12
gbb01-m01-esx03	IN	A	10.1.1.13
gbb01-m01-esx04	IN	A	10.1.1.14
gbb-m01-vc01	IN	A	10.1.1.3
gbb-m01-nsx01	IN	A	10.1.1.4
gbb-m01-nsx01a	IN	A	10.1.1.5
gbb-m01-nsx01b	IN	A	10.1.1.6
gbb-m01-nsx01c	IN	A	10.1.1.7
gbb-vcf01	IN	A	10.1.1.8
gbb-m01-en01	IN	A	10.1.1.21
gbb-m01-en02	IN	A	10.1.1.22
cb		IN	A	10.1.1.31
vrslcm		IN	A	10.1.8.11
wsa		IN	A	10.1.8.12
vrli		IN	A	10.1.8.13
vrops		IN	A	10.1.8.14


/var/named/10.1.reverse.db

$TTL 3600
@ SOA	ns.vcfrnd.com. root.vcfrnd.com. (1 15m 5m 30d 1h)
	NS ns.vcfrnd.com.

1.1		PTR	l3router.vcfrnd.com.
2.1		PTR	ntp.vcfrnd.com
2.1		PTR	ns.vcfrnd.com
3.1		PTR	gbb-m01-vc01.vcfrnd.com.
4.1		PTR	gbb-m01-nsx01.vcfrnd.com.
5.1		PTR	gbb-m01-nsx01a.vcfrnd.com.
6.1		PTR	gbb-m01-nsx01b.vcfrnd.com.
7.1		PTR	gbb-m01-nsx01c.vcfrnd.com.
8.1		PTR	gbb-vcf01.vcfrnd.com.
11.1		PTR	gbb01-m01-esx01.vcfrnd.com.
12.1		PTR	gbb01-m01-esx02.vcfrnd.com.
13.1		PTR	gbb01-m01-esx03.vcfrnd.com.
14.1		PTR	gbb01-m01-esx04.vcfrnd.com.
21.1		PTR	gbb-m01-en01.vcfrnd.com.
22.1		PTR	gbb-m01-en02.vcfrnd.com.
31.1		PTR	cb.vcfrnd.com.
11.8		PTR	vrslcm.vcfrnd.com.
12.8		PTR	wsa.vcfrnd.com.
13.8		PTR	vrli.vcfrnd.com.
14.8		PTR	vrops.vcfrnd.com.



Home > VMWare platform > vCloud Foundation > Setup manual lab environment for vCloud Foundation using nested ESXi hosts