Issue/Introduction
- Reconfiguring vSphere HA fails for several hosts in the cluster, but some elect into primary or secondary status.
- After upgrading to vCenter Server 8.0.3, HA enabled clusters fail to configure, where only a few hosts elect properly
Messages in fdm.log mention “SSL Async Handshake Timeout” when contacting other hosts - fdm.log also contains messages similar to the following when attempting to contact the master FDM host
SSL Async Handshake Timeout : Read timeout after approximately 25000ms. Closing stream SSLFailed to SSL handshake;
Environment
vCenter Server 8.0.3
Cause
MTU Mismatch on Management network. FDM does support Jumbo Frames, but the MTU setting has to be consistent from end to end on every device.
Resolution
Check MTU settings for the vmk, vmnic, and vSwitch/DVS involved with the Management network on each host to confirm the mismatch.
Confirm the issue using network commands at the ESXi shell:
vmkping -I vmkX x.x.x.x- vmkping using the vmk for the Management network is successful between all or most hosts
- vmkping using the vmk for the Management network is successful between all or most hosts
vmkping -d -s 8972 x.x.x.x- vmkping using jumbo frames Management network only works between elected hosts with their MTU set correctly
- vmkping using jumbo frames Management network only works between elected hosts with their MTU set correctly
openssl s_client -connect x.x.x.x:8182- From the primary agent host to one of the hosts that isn’t electing will not return the SSL certificate. Doing so between elected hosts returns the SSL certificate as expected.
Edit the device used for the Management network that is set incorrectly and change the MTU to 9000.