In this article I would like to share my latest experience with the Spectre/Meltdown patching and the performance impact it can have on the VMware appliances such as the vCenter Server.
A customer followed exactly the VMware procedures by applying the latest patches and workarounds once published.
The vCenter Server appliance had to be patched to add the new speculative-execution control mechanism for Virtual Machines in EVC, the underlying hosts had to be patched, and all VMs had to be at least at virtual hardware version 9 to be protected, or even at version 11 to reduce the performance impact on the VMs.
The VMware security advisory VMSA-2018-0004 details all the required steps.
But what about the VMware appliances? Usually VMware appliances should not be modified, meaning that it is to my knowledge not supported to update their virtual hardware version.
The VCSA 6.0 ships with vHW 08, so according to the VMSA-2018-0004 it is neither protected (new mechanisms require vHW 09 minimum) nor supported to be upgraded to a newer version.
VMware documentation in this article state that the upgrade to vHW 11 is indeed supported:
Version 6.0 of the vCenter Server Appliance is deployed with virtual hardware version 8, which supports 32 virtual CPUs per virtual machine in ESXi. Depending on the hosts that you will manage with the vCenter Server Appliance, you might want to upgrade the ESXi hosts and update the hardware version of the vCenter Server Appliance to support more virtual CPUs:
ESXi 5.5.x supports up to virtual hardware version 10 with up to 64 virtual CPUs per virtual machine.
ESXi 6.0 supports up to virtual hardware version 11 with up to 128 virtual CPUs per virtual machine.
After opening a SR, VMware GSS eventually confirmed that the upgrade to vHW 11 would be supported, as it is already mentioned in the VMware docs.
But what if I don’t upgrade?
According to the VMSA, VMs need to be at least at vHW 09 to be protected, and at vHW 11 to reduce the impact on performance. (See KB 52085 for more details).
Note: In the KB 52264 VMware lists the appliances which should be patched, and those unaffected by Spectre and Meltdown.
The following graph shows a VCSA 6.0 appliance after it has been patched, but with vHW version at the default version 8. After the VCSA was patched, the CPU graphs show a CPU usage to 12%, even though that this vCenter was not highly used, and certainly not used more than before the patching.
After ensuring that the EVC level was set to at least Westmere, and after getting the green light from VMware GSS, the appliances vHW version was increased to version 11.
The result is pretty clear: Without changing anything else than the vHW version to 11, the CPU usage on this low-usage vCenter decreased from an average of 12% to about 5%.
The biggest impact however has been seen on a highly used vCenter with more than 8000 VMs and about 600 hosts.
Immediately after patching the vCenter, the CPU usage went up. For days it stayed at 80-100%, and after a reboot to upgrade the EVC level to Westmere, the CPU usage went down a bit, but was still very high.
The upgrade to vHW 11 though solved the issue of the high CPU issue and brought it back to around 60% (which is still high, I know). The impact on this high-usage vCenter Server after patching the appliance is enormous, and by upgrading the vHW version from 08 to 11 it could be reduced by about 20%.
Patching was required for security, and usually people follow the VMware guidelines, which in this case included the upgrade of the vHW version of their VMs (if not already done) to version 09 or 11, but tend to forget their VMware appliances. This article was to show the impact if the vCenter appliances are not upgraded, but I imagine that a similar experience can be drawn from other high-usage VMs.
VMware GSS confirmed that the upgrade of the vHW versions of their impacted appliances would be a supported scenario. I would like to add that not only it is supported, but also required for performance mitigation.
The conclusion in the end is:
Upgrade your virtual hardware version on all your VMs (including vCenters) to at least vHW 11 to mitigate the performance impact.