[SOLVED] How to troubleshoot guest freezing after live migration

Jun 8, 2016
344
75
93
48
Johannesburg, South Africa
We are running PVE 5.1 with all available updates. Live migrating RHEL5 / CentOS 5 guests often results in them freezing when they resume on the destination nodes.

This affects Check Point R80.10 nodes (MDS, Log and SmartEvent servers) 100% of the time and we have an Asterisk SIP server which simply has Asterisk installed on a plain CentOS 5.11 VM and this also always locks up when it's live migrated.

PS: We had numerous problems migrating VMs from PVE 4.4 to PVE 5.1 during the upgrade cycle, especially VMs with more than 4GB of RAM but this is no longer an issue with all hosts on PVE 5.1. The RHEL5 / CentOS5 issue however remains.

I assumed that live migration would be completely VM independent. These VMs do not support VirtSCSI so they all run VirtIO disk and network interfaces. We're obviously refreshing systems whenever possible, but with regards to Check Point there is no later base...
 
We leave memory ballooning enabled by default but leave memory allocation on a fixed size. This avoids memory ballooning overheads but still provides granular statistics. Having memory ballooning enabled however results in RHEL5 guests locking up after live migration, especially CheckPoint R80.10 instances which are based on RHEL5.

Disabling memory ballooning, even though memory allocation is fixed, resolves the issue with these guests freezing after live migrating.