Our cluster is:
PVE221: Dell PowerEdge R240 Xeon(R) E-2234
PVE222: Dell PowerEdge R610 Xeon(R) CPU L5520 (just checked, I thought it was R620)
PVE223: Dell PowerEdge R6515 AMD EPYC 7313P
Only PVE221 can't boot with kernel 6.5.
No matter if tg3 module is loaded or not and it's not a display...
I've done echo "blacklist tg3" >> /etc/modprobe.d/no-tg3.conf but it still gets stuck on
and it's not a display issue: no ping, no ceph, nothing.
Using kernel 6.2 with the same config it boots ok.
About the VMs problems, this is what happens:
Thanks,
Yes, you're right, there are 2 Broadcom NetXtreme BCM5720, although we are not using them.
We use only the Intel SFP+ ones.
root@pve221:~# lspci
00:00.0 Host bridge: Intel Corporation 8th/9th Gen Core Processor Host Bridge/DRAM Registers [Coffee Lake] (rev 07)
00:01.0 PCI bridge: Intel...
I don't think so:
root@pve221:~# hwinfo --network | grep Driver
Driver: "tg3"
Driver: "ixgbe"
Also, some Linux VMs hangs with CPU errors when migrating from kernel 6.5 hosts to 6.2 ones.
My Dell R240 doesn't boot with any 6.5 kernel and it's not a display issue.
It doesn't respond to ping, GUI, ceph or whatever, it's just stuck.
I don't use GPU passthrough either.
This is my setup:
root@pve221:~# uname -a
Linux pve221 6.2.16-19-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-19...
Same issue here:
root@pve223:~# lvs
WARNING: VG name SangomaVG is used by VGs lgBrPv-szSW-YzKD-Z4zc-IRjR-jwOD-cwqoYW and aRJ1Iy-dvZz-1tN3-yUlU-XaL3-45tj-wwB2ZI.
Fix duplicate VG names with vgrename uuid, a device filter, or system IDs.
WARNING: Not using device /dev/rbd23p2 for PV...
Same problem here.
Different clusters, different CPUs, different brands, CEPH storage or ZFS, Linux VMs, Windows VMs... I cannot find anything to track.
It's very very annoying
Just tested kernel 5.15.53-1-pve on another 5 nodes cluster and all VM migrations where fine.
root@pve221:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.53-1-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-helper: 7.2-12
pve-kernel-5.15: 7.2-10...
I don't know the reason, but now, using the same patched kernel, every migration fails again. All of them: Windows, Linux...
root@pve226:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.39-3-pve-guest-fpu)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-8...
Just upgraded all Proxmox packages, including kernel 5.15.39-3-pve (regular one, not patched) and the problem remains, but in reverse.
Now I can migrate VMs from host with newer CPU to older ones, but not from older to newer.
root@pve222:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel...
Ok, I'll try it right now.
And... understanding everything... I have to say that live migration involving different hw worked fine ever, till kernel > 5.15.30-2-pve.
And I really loved that.
Just updated everything to the latest version and the problem is NOT fixed:
root@pveo03:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.39-2-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-7
pve-kernel-helper: 7.2-7
pve-kernel-5.15.39-2-pve: 5.15.39-2...
I've just updated one of our PVE clusters to the latest version and the problem persists as described many times above: live migration from a newer CPU to an older one freezes the VM.
root@pve222:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.39-1-pve)
pve-manager: 7.2-7 (running...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.