[Proxmox 7.2-3 - CEPH 16.2.7] Migrating VMs hangs them (kernel panic on Linux, freeze on Windows)

Pakillo77

Active Member
Aug 19, 2020
31
13
28
47
Linares, Spain
Hi,
After upgrade to PVE 7.2-3 I cannot migrate VMs between nodes.
The migration process is done ok, but the VM hangs once it is hosted in the target node.

root@pve224:~# pveversion -v proxmox-ve: 7.2-1 (running kernel: 5.15.35-1-pve) pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1) pve-kernel-5.15: 7.2-3 pve-kernel-helper: 7.2-3 pve-kernel-5.4: 6.4-15 pve-kernel-5.15.35-1-pve: 5.15.35-3 pve-kernel-5.4.174-2-pve: 5.4.174-2 pve-kernel-5.4.128-1-pve: 5.4.128-2 pve-kernel-5.4.73-1-pve: 5.4.73-1 ceph: 16.2.7 ceph-fuse: 16.2.7 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown: residual config ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve2 libproxmox-acme-perl: 1.4.2 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.1-8 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.1-6 libpve-guest-common-perl: 4.1-2 libpve-http-server-perl: 4.1-1 libpve-storage-perl: 7.2-2 libqb0: 1.0.5-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.12-1 lxcfs: 4.0.12-pve1 novnc-pve: 1.3.0-3 proxmox-backup-client: 2.1.8-1 proxmox-backup-file-restore: 2.1.8-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.4-10 pve-cluster: 7.2-1 pve-container: 4.2-1 pve-docs: 7.2-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.4-2 pve-ha-manager: 3.3-4 pve-i18n: 2.7-1 pve-qemu-kvm: 6.2.0-6 pve-xtermjs: 4.16.0-1 qemu-server: 7.2-2 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.7.1~bpo11+1 vncterm: 1.7-1 zfsutils-linux: 2.1.4-pve1


I've tested it on 2 different clusters: same versions, same result.
 

Attachments

  • Image_2022-05-16.jpeg
    Image_2022-05-16.jpeg
    230 KB · Views: 47
Last edited:
  • Like
Reactions: fettfoen
I've compared the Proxmox & KVM packages versions with another well working cluster and I've downgraded those packages: same error.

I've downgraded then the kernel version, and after the reboot... it works ok!
Kernel version pve-kernel-5.13.19-6-pve works ok, no problem migrating VMs.

Then, I've upgraded all Proxmox & KVM packages versions again and the migration keeps working fine.
Then, Ive upgraded the kernel to pve-kernel-5.15.35-1-pve and it fails.

So, it's clear to me that this is a kernel related problem, and I've tested some of them in the same scenario:
pve-kernel-5.15.35-3-pve FAILS
pve-kernel-5.15.35-1-pve FAILS
pve-kernel-5.15.30-2-pve WORKS
pve-kernel-5.13.19-6-pve WORKS

Commands to downgrade packages without dependency problems:
apt install pve-kernel-5.15.30-2-pve proxmox-ve=7.1-2 apt remove pve-kernel-5.15.35* reboot


Ciao,
 
Last edited:
Please, tell me if downgrading to pve-kernel-5.15.30-2 solves the problem for you too.
sadly no. downgraded all nodes to "Linux version 5.15.30-2-pve": still the same hang after migration the vms between the hosts.

i guess next should be to downgrade pve-qemu-kvm?!
 
Last edited:
sadly no. downgraded all nodes to "Linux version 5.15.30-2-pve": still the same hang after migration the vms between the hosts.

i guess next should be to downgrade pve-qemu-kvm?!
Did you get any further with this topic? We do have the same problem. Also downgrading proxmox-ve and kernel does not solve it.
 
Did you get any further with this topic? We do have the same problem. Also downgrading proxmox-ve and kernel does not solve it.
OK. I have tried to upgrade to pve-qemu-kvm: 6.2.0-8 from pve-test. Now the issue seems to be gone.
 
Hi,
After upgrade to PVE 7.2-3 I cannot migrate VMs between nodes.
The migration process is done ok, but the VM hangs once it is hosted in the target node.

root@pve224:~# pveversion -v proxmox-ve: 7.2-1 (running kernel: 5.15.35-1-pve) pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1) pve-kernel-5.15: 7.2-3 pve-kernel-helper: 7.2-3 pve-kernel-5.4: 6.4-15 pve-kernel-5.15.35-1-pve: 5.15.35-3 pve-kernel-5.4.174-2-pve: 5.4.174-2 pve-kernel-5.4.128-1-pve: 5.4.128-2 pve-kernel-5.4.73-1-pve: 5.4.73-1 ceph: 16.2.7 ceph-fuse: 16.2.7 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown: residual config ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve2 libproxmox-acme-perl: 1.4.2 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.1-8 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.1-6 libpve-guest-common-perl: 4.1-2 libpve-http-server-perl: 4.1-1 libpve-storage-perl: 7.2-2 libqb0: 1.0.5-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.12-1 lxcfs: 4.0.12-pve1 novnc-pve: 1.3.0-3 proxmox-backup-client: 2.1.8-1 proxmox-backup-file-restore: 2.1.8-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.4-10 pve-cluster: 7.2-1 pve-container: 4.2-1 pve-docs: 7.2-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.4-2 pve-ha-manager: 3.3-4 pve-i18n: 2.7-1 pve-qemu-kvm: 6.2.0-6 pve-xtermjs: 4.16.0-1 qemu-server: 7.2-2 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.7.1~bpo11+1 vncterm: 1.7-1 zfsutils-linux: 2.1.4-pve1


I've tested it on 2 different clusters: same versions, same result.


We have same issue with AMD EPYC 7543P Processor... do you have AMD too? Intel is working fine...
 
Still having the same problem with Proxmox 7.2-4 and kernel 5.15.35-2.

root@pve221:~# pveversion -v proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve) pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc) pve-kernel-5.15: 7.2-4 pve-kernel-helper: 7.2-4 pve-kernel-5.15.35-2-pve: 5.15.35-5 pve-kernel-5.15.35-1-pve: 5.15.35-3 pve-kernel-5.15.30-2-pve: 5.15.30-3 ceph: 16.2.7 ceph-fuse: 16.2.7 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.24-pve1 libproxmox-acme-perl: 1.4.2 libproxmox-backup-qemu0: 1.3.1-1 libpve-access-control: 7.2-2 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.2-2 libpve-guest-common-perl: 4.1-2 libpve-http-server-perl: 4.1-2 libpve-storage-perl: 7.2-4 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.12-1 lxcfs: 4.0.12-pve1 novnc-pve: 1.3.0-3 proxmox-backup-client: 2.2.3-1 proxmox-backup-file-restore: 2.2.3-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.5.1 pve-cluster: 7.2-1 pve-container: 4.2-1 pve-docs: 7.2-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.4-2 pve-ha-manager: 3.3-4 pve-i18n: 2.7-2 pve-qemu-kvm: 6.2.0-10 pve-xtermjs: 4.16.0-1 qemu-server: 7.2-3 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.7.1~bpo11+1 vncterm: 1.7-1 zfsutils-linux: 2.1.4-pve1
 
I confirm the problem with live migrating from new to old CPU
proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
 
I'm just a freeloader running on consumer grade hardware (except for my SSD and nic cards) and I also have the same issue. One box has an I7-12700K in it, the other an I7-8700K and migrating a machine from the 12700K to the 8700K would cause it to lock up and I'd have to SSH into the node and force kill it.

The rollback fixed my issue
https://forum.proxmox.com/threads/p...on-linux-freeze-on-windows.109645/post-471557
 
  • Like
Reactions: Pakillo77
I'm just a freeloader running on consumer grade hardware (except for my SSD and nic cards) and I also have the same issue. One box has an I7-12700K in it, the other an I7-8700K and migrating a machine from the 12700K to the 8700K would cause it to lock up and I'd have to SSH into the node and force kill it.

The rollback fixed my issue
https://forum.proxmox.com/threads/p...on-linux-freeze-on-windows.109645/post-471557
Did you have to rollback the older cpu node? or everything?
 
Did you have to rollback the older cpu node? or everything?
I rolled back both nodes.

Edit: Before rolling back, I could migrate from the 8700k to the 12700k w/o issue. So I migrated things off the 8700k and rolled it back. When I attempted to migrate from the 12700k to the 8700k so I could roll it back, they hung, so I don't think you can apply it to just a subset of nodes. *maybe* it could work on the newer cpu nodes, but from my observation, it won't work just being applied to the old cpu nodes.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!