Mar 10, 2018
Hi all,

I've got an interesting issue with a FreeBSD-8.4 appliance guest (Citrix Netscaler 11.1). Ceph is the storage backend. Hosts are matching Xeon E5-2690 configurations.

Everything is fine so long as the guest remains on the host it booted on. If live-migrated anywhere, the guest can become unresponsive to end users and on the VNC console after a few minutes, although I can shell in via the serial console without issue.
On a test VM, I noticed that after a live-migration, the clock's speed starts so go seriously out of whack. Starting from about a minute after the live-migration, the clock slows down and becomes quite sporadic - if I run "date" once a second, it more or less keeps up. If left alone for 10 minutes, the guest clock only ticks forward by about 2 minutes.

The above is with the BSD guest having default kern.timecounter.hardware="HPET". Changing it to ACPI-safe (before migration) seems to exacerbate the issue even more (after migration).

Anyone got any tricks I can try to either get to the bottom of this, or to stabilise the guest's clock/ticksource?

during the live migration, KVM will reduce the clock cycle do slow the VM down. This is intended because if this would not happen the migration would take longer or even fail.

So you should use an NTP client to keep the time.
Thanks Wolfgang.
That explains the initial loss of a few seconds, but after migration it just keeps getting worse...

root@NetScaler-Test# echo "post migrate 165500" ; date
post migrate 165500
Thu Jun 7 16:54:58 NZST 2018
root@NetScaler-Test# echo "post migrate 170000" ; date
post migrate 170000
Thu Jun 7 16:58:32 NZST 2018
root@NetScaler-Test# echo "post migrate 170030" ; date
post migrate 170030
Thu Jun 7 16:59:02 NZST 2018
root@NetScaler-Test# echo "post migrate 170100" ; date
post migrate 170100
Thu Jun 7 16:59:32 NZST 2018
root@NetScaler-Test# echo "post migrate 173300" ; date
post migrate 173300
Thu Jun 7 17:05:03 NZST 2018
root@NetScaler-Test# echo "post migrate 174600" ; date
post migrate 174600
Thu Jun 7 17:05:53 NZST 2018

That's more than the usual drift, and the whole VM becomes unresponsive other than at the serial terminal (VNC console and networking also become intermittently unresponsive) - it ends up ticking only 11 minutes when in the real world nearly 50 minutes have passed. And the speed is uneven (it seems to be worse when idle or close to it).
I've just realised my choice of words downplays the effect this has on the VM - the clock doesn't merely drift, it almost stops. I don't seem to be able to edit the subject line, can an admin change it to "PVE 5.1, Netscaler guest (FreeBSD8.4) clocksource unstable leading to VM crash after migration"?


