[SOLVED] VM freeze or service stalling after live migration (3.1)

ScOut3R

Member
Oct 2, 2013
55
5
6
Dear Community,

I'm running 3.1-14 on 5 nodes and using ceph 0.67.3 as the storage backend for VMs. Doing live migration with an Ubuntu 10.04 (kernel 3.0.0 backport) or Ubuntu 12.04 (kernel 3.8.0 backport) or Debian 7 (kernel 3.10 backport) guest i'm experiencing VM freezing or service stalling. A few minutes after a successful migration the VM either becomes unresponsive and i had to reset it or the services (especially cron) stops functioning (the process is running but does nothing). Right now i'm not sure wether it's a guest, a host or storage related issue. I've installed this cluster in August using 3.0 then and live migration was working fine. Has anyone experienced similar issues?

Best regards,
Mate
 
Last edited:
  • Like
Reactions: jbennet
I've done some further investigation. So far it seems to be related to the guest kernel. Using 3.8 or 3.10 in the guest the issue appears usually right after the first migration, though i was able to do 7 migrations in a row then on the 8th it happened. Using kernel 3.2 i've done 15 migrations in a row without the issue showing. And if i remember i've started experiencing this issue when i've started to upgrade the kernels in the guest systems to mitigate the network and io performance.
 
In another post, on PVE 3.0 I've a similar problem on Windows guest, *but* ONLY on different hardware node.
On three nodes, 1-2 are identical, node 3 is different, migrate to-from node 1-2 works fine, migration to-from node 3 freeze VM and 100% CPU.

I have not had the possibility to upgrade because the cluster is in production...

Luca
 
Actually my 5 node cluster is identical in hardware and when the issue occurs the related kvm process' cpu utilisation is looking normal. With the investigation i was able to more clearly experience the issue. When it happens a can ssh into the vm, though the login process is a bit slow. I can run around in the shell just fine, for example dmesg or ps is working, but top doesn't. It just hangs showing me the shell, but nothing happens, although i can use ctrl+c to get back the prompt. Also restarting the vm hangs at the stopping services step so i have to reset it.
 
I'm experiencing the issue with guests using kernel 3.11.3. I've made 15 migrations in a row with an OpenBSD 5.3 guest and it went just fine, except that i lost network connection 15 minutes later when i've left the VM as is. I had to down/up the interfaces to get the network up again.
 
Last edited:
Changing the CPU type from host to kvm64 seems to solve the problem, though i have an identical infrastructure and migration was working around version 3.0. Anyway, i'm glad it is stable again. :)
 
  • Like
Reactions: jbennet

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!