It is still ok. Some very old VM's like Debian 4, 5, 6... Etc. And similar Ubuntu (Yes we have to use them) have randomly problems (with IO?) and freeze with some kernel panic... Sometimes . Not often....
But it is ok!
latest kernel Linux 6.8.8-2-pve seems to be OK
one experimental server with this kernel is running for a week and did not freeze
so i will upgrade the rest of them....
https://forum.proxmox.com/threads/kernel-6-8-4-2-causes-random-server-freezing.146327
do you have Ceph and NVMe drives?
Another thread: https://forum.proxmox.com/threads/random-6-8-4-2-pve-kernel-crashes.145760
Hi @Der Harry ,
I have started a similar thread https://forum.proxmox.com/threads/kernel-6-8-4-2-causes-random-server-freezing.146327
My 12 servers are "freezing", no logs, no segfault, no memory leak, no amdgpu, no iommu issue, etc.
Yes we have Ceph and some NVMe drives for Ceph OSD. This is...
This happens once in ten years...
I've had now some very hot moments at a large scientific organization where I run a entire rack of Proxmox servers and a Ceph cluster.
Even the basic infrastructure was randomly down (Firewalls, DNS, DHCP ...).
But it's OK. Sh1t happens. That's life!
The...
Please help identify the common factors that cause this problem.
I can confirm, that I had exact the same issue as described by @Tim-AU and @Lephisto
Still looking for the code different or added in kernel 6.8 causing this issue.
* I'm using in the VM CPU host, is this the same in your case?
*...
Asrock and Asus are server vendors And we have a good relation with them.
And this Is not the case of RX570. As I wrote in my previous post, I'm aware of three different situations of 6.8 freezing - non of them applies to a epyc (non GPU) server.
The RX570 relates to the Destroy DC context...
solved!
it is definitely a kernel 6.8 bug
i need to know which kernel patch/commit is causing this regression
i'm looking in to ubuntu kernel and are the any proxmox specific patches?
i will share this kernel problem with asrock and asus vendor
we need to know, what changed in kernel 6.8
and when it would be save, to go upstream again
are the any proxmox specific kernel patches?
tried kernel 6.8 without success:
amd_iommu=off iommu=off
default - no parameters
and with ceph optimatizations amd_iommu=on iommu=pt pcie_aspm=off
with...
after many attempts
replacing PSU, CPU, RAM etc...
trying different kernel parameters etc...
checking ipmi and system for any error and logs
checking all power cables and upgrading UPS
updating BIOS, Firmware and BCM/IPMI
BIOS configuring different options
installing a new version of...
This thread is dedicated to the issue where the server just freezes.
If the kernel gives error messages when the server crashes
there is a thread https://forum.proxmox.com/threads/random-6-8-4-2-pve-kernel-crashes.145760
and not AMD GPU related as in...
Solved
there is no bonding involved
just the kernel (version) + bnxt_re module + Broadcom firmware + initializing RDMA RoCE
I had the same problem.
In my rack there are in total 25 pieces of Broadcom BCM57504 4x25G SFP28 PCIe network cards.
Most of them do not have this problem, but some of...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.